the 
ὑεῖς : 
Obie [boLtag 


μ᾿ κ« 


Mites & 


piteal 


“ὦ 


Pe res Τι: 


πατιν RAWAL Meera tat ees βῆ Utena Neg eter rape weber εὐ, LST Sosy ως : Pind paste ees Fis ings ; ὀκρο νεῖν ian tee el ντονωι ἐπύρτρφάςτς πρανεῖ, ν Sere 
eae s a? i + 7 A: τῆς τὰ 2 a? Ar is joe ga kor Τὴ πον οι em νοῦ r 


re . pha PRPUrp ee ey Std. τὰς Ap a ele 
ips 


ae ae : ; Wee eid ge Spee an cer app id pratrene hore pp Se aes Pip 3 a 
Vn Da og Ra a ι Spey eae rites dita egal aoe ert Th Te Sauron hated 


-ἰ 


J+ 
I 
mod ἡ 
Fla] 
F(a) 
i) 


{a|ais P} 


DEFINITIONS AND NOTATIONS 


References are to sections of chapters 
summation (1.7) 
n factorial (1.7) 


binomial coefficient (1.7) 


absolute value or modulus (1.7) 

field of rational numbers (2.1, 2.2) 

adjunction, of ./2 to R (2.3) 

field of real numbers (2.4) 

greatest lower bound (LUB similarly) (2.4) 

complex unit, ἐξ Ξε — 1 (2.5) 

complex number x +iy=r(cos θ +7 sin θ) =re# 
(2.5, 12.7) 

absolute value and argument, of z (2.5) 

field of complex numbers (2.5) 

set of positive Integers (natural numbers) (2.6) 

integral domain of integers (2.6) 

modulo ἡ (2.7) 

polynomials over field F. (3.3) 

rational fractions, adjunction of a to F (3.4) 

nth root of unity (3.8) 

set (4.1) 

belongs to (4.1) 

proper subset of (4.1) 

complement, of set A (4.2) 

intersection and union, of sets (4.2) 

universal and empty sets (4.2) 

infinite cardinal numbers (4.7, 4.8) 

negation of p (not) (5.1) 

conjunction and disjunction (and, or) (5.1) 

implication, many-one mapping (5.1, 7.3) 

equivalence, one-one mapping (5.1, 7.3) 

probability, of statement @ (5.5) 

conditional probability, of a, given a, (5.6) 

group, with identity e and a-! inverse of a (6.2) 

operator; sr(A) successive operators (r first 
8 second) (6.4) 

field, with identities 0 and 1 (6.5) 

Cartesian product, of sets (7.1) 

statement of a relation R (7.1) 

function (7.3) 


cos 2;, sin & 
tan x, tan 
cosh 2, sinh x 
A=|| @,, || 
A’ 


(ABCD) 


(1, +2, 0) 
[α, δ] 
Ν 
0) 


[sea 


[6] at 
D-4f (2) 
Df (@) 
Max f(x) 
Un 
t,x" 


π —3°14159... 
ée=—2-71828... 


e*, exp x 
log x 
ar x 


A=|A| 
A7 
¥(P) 
Y (s) 


‘J 


mapping ; το transformation (7.3) 


isomorphic (7.4) 

function of a complex variable (7.6) 

vector space (8.3) 

space of n-tuples over F (8.4) 
n-dimensional Euclidean space over f’ (8.4) 
AB.CD, 

4D.oB *” 

circular points at infinity (8.8) 

interval a<x <b (9.3) 

neighbourhood, of « (9.3) 

inverse function (9.3) 

limit of f(z) as τ increases without bound (9.5) 


cross-ratio 


limit of f(x) as x approaches α (9.6) 
derivative of y=f(x) (10.2) 


definite integral of f(x), area (10.5) 


indefinite integral of f(x), anti-derivative (10.6, 10.8) 


nth derivative, (-7)th integral, of f(a) (10.8) 
local maximum of f(#) (minimum similarly) (11.2) 
infinite series (11.3) 
power series, with radius of convergence r (11.6) 
Archimedes’ constant (12.1, 12.5) 
Euler’s constant, Lim (1 +] ’ (12:1, 12.2) 
as) it 
exponential function (12.2) 
logarithmic function (12.3) 
power functions (12.4) 
circular functions (12.5), trigonometric functions 
(12.7) 
hyperbolic functions (12.6) 
matrix (13.4) 
transpose, of matrix (13.5) 
determinant, of matrix (13.6) 
inverse, of matrix (13.6) 
Laplace transform, of y(é) (14.7) 
generating function, of Y,, (14.7) 


ππ| ΠΤ ΠΡῦᾷΡᾷϑΓΡΠΡΡΠΡΠῚΡΠΠ ΠΤ ΟΠ τ υυυυυυυυυυυυυυυυυυυυυυυυου 


Leyland Public Library,’ Hindley. 
LENDING DEPARTMENT 
HOURS: 


Monday 

Tuesday -§-30 a.m. to 7 8.1. 
Wednesday | 

Thursday Closed 
Friday 9-30 a.m. to §/p.m. 
Saturday ]0-ann. to g p.m. 


excepting Bank Holidays and fany other 
occasion when the Library Com ittee may 
so direct. 

Readers are responsible for books bor- 
rowed on.their tickets. 

Please use the books carefully and keep 
them clean, 

Books should be examined before taking 
out and attention called to any defects. 


Damage done to books, or loss of books, | 


will be payable by the reader or the 
guarantor, 


Time allowed for reading this book is 14 
days, not including day of issue. 
FINES 
For books not returned within 14 days, 
Fines will be charged as follows:- ςΦςὦὃο'.--- 
1d. for first week or fraction of Weekes 
3d, for second week or fractiannp ek; 
5d, for third week orifractio Ὁ werk: 
Yd, for fourth week or Heption of week. 
Borrowers will also/ be ‘call 
postage expenses seta y in 
return of overdue books. eS 
‘of the Library Committee. 


-- 
ΜΝ 


Ww 


Accession No, | 


\ 


Ree 


FES PSCC REE Cen }ὺῸ}Ρ»Ψ 1 Τὺ} τὺ 0" ERE ΤΟΙ Ο ΟΟΟΟο Β5Β ἘΒΒΒΒΕΚΉΙΜΕ 


BASIC MATHEMATICS 


’ 


(sng od OOF 
am, fh 4 = Ἐς 


————— 


ee Oech, a Oe 


FOR 


> 
=< 
_— 
te 
ὅν 
3 
δ 


es 
[η 
as 


Ler eet Se eS ee ee ee a ee 
— —= 
a: 


BASIC 
MATHEMATICS 


BY 
R. G. D. ALLEN 


LONDON 
MACMILLAN & CO LTD 
NEW YORK "Ὁ ST MARTIN’S PRESS 
1962 


8 


Copyright © R. G. D. Allen 1962 


SS EE — ΞΕ οεΝς, εὐ ον 
- ΓΞ ΞΜ αν 


-- - 
the 


ἢ MACMILLAN AND COMPANY LIMITED 
| St Martin’s Street London WC 2 
also Bombay Calcutta Madras Melbourne 


THE MACMILLAN COMPANY OF CANADA LIMITED 
Toronto 


ST MARTIN’S PRESS INC 
New York 


PRINTED IN GREAT BRITAIN 


PREFACE 


“The great weakness of teaching in mathematics is that too much of it is con- 
cerned with training in mathematical jugglery and too little with education in 
mathematical ideas’ (1), B. Welbourn). 


THE NEW mathematics developed during the last 50-100 years are 
now well entrenched in honours courses at universities: they have 
scarcely made any impression at all on the teaching of mathematics 
in schools, Admittedly, a lag is to be expected here. Τὸ is only re- 
cently, both in the United States and in Europe, that the lag has been 
recognised as a serious one, as threatening the output of competent 
mathematicians at a time when the demand for them is rapidly in- 
creasing. ‘The content of mathematical teaching, particularly at the 
critical phase of transition from school to university, is in urgent 
need of drastic change. With the situation deteriorating so obviously 
and so rapidly, several groups of mathematicians got together a few 
years ago in the United States to undertake urgent studies of mathe- 
matical teaching. The Committee on the Undergraduate Mathematical 
Program reported in 1955 and two volumes of Universal Mathematics 
had appeared by 1958. The 1957 Yearbook of the National Council of 
Teachers of Mathematics was devoted to Insights into Modern Mathe- 
matics. A Commission on Mathematics prepared a report which was 
published by the College Entrance Examination Board in 1959. A 
little later, the Organisation for Kuropean Economic Co-operation 
Sponsored a seminar and a survey on the teaching of mathematics in 
schools in Europe and published their report, entitled New Thinking 
in School Mathematics, in May 1961. Discussions in Britain resulted 
In a report: On Teaching Mathematics (Editor: Bryan Thwaites), 
published by Pergamon Press in July 1961. The present text is 
designed primarily to assist in the promotion of the necessary reforms 
in the basic teaching of mathematics. 

The point which I wish to stress above all others is that mathe- 
matics is an exciting, enjoyable and challenging study. To one who 
takes a broad view, mathematics not only has great range and power; 

v 


vi PREFACE 


it also has essential unity and simplicity. ΤῸ one who responds to a 
challenge, mathematics presents a vast field to explore, with boundaries 
both ill-defined and always expanding. To one who would have things 
done elegantly and economically, mathematics is a most satisfying 
discipline. Mathematicians aim at the best formulation, both as a 
rewarding exercise in itself, and to provide a foundation on which the 
super-structure of mathematical applications can be safely con- 
structed. A preference for a beautifully neat and sound development, 
as opposed to a set of correct but messy proofs, is something much 
to be encouraged. 

It is essential to penetrate the smoke screen laid down by so many 
mathematicians — their jargon and the devices they employ in 
‘solving problems’. Potential mathematicians need to appreciate the 
basic structure of mathematics, particularly with a view to the proper 
formulation of mathematical models in the sciences. They should be 
taught, not merely how, but why things are done in mathematics. 
Their education should be concentrated on mathematical ideas; 
dexterity in ‘mathematical jugglery’ in problem solving comes by 
experience. I have attempted to provide a short course on these 
lines, for a variety of people. They may be in their freshman year, 
beginning university courses in the natural, appied or social sciences. 
They may be in graduate schools, having discovered rather late that 
their subjects are treated mathematically. They may be teachers in 
schools, attempting to get across the basic mathematical ideas and 
to fire the imagination of young mathematicians. 

No grounding in elementary mathematics at any of the recognised 
school levels is assumed of the reader of this text. The simplest 
algebraic processes, and the ideas of graphs and mensuration, must of 
course be appreciated. The main qualification for reading on, how- 
ever, is a well-developed logical sense and a desire to avoid the loose 
thinking which so often passes for mathematical exposition. The course 
presented is perhaps better followed under expert guidance rather 
than in private reading. A reader should have examples appropriate 
to his need suggested to him, and he should be given some indication 
of how basic mathematical ideas apply in his own field of study. Even 
so he must work hard, painfully sorting out his ideas; he is master 
of what he discovers for himself. I do not say that anyone who com- 
pletes this course is thereby fully prepared to go on to such technical 


-Ρ-- 


studies as matrix algebra or differential equations. I do maintain, 
however, that he will be much better prepared to do so. 


PREFACE Vil 


London School of Economics R. G. D, ALLEN 
July 1961 


ACKNOWLEDGEMENTS 


[ THANK Dr. Constance Rigby of University College London for her 
sympathetic reading of the original typescript; I have made many 
changes as a result of her diligent and expert criticisms. I am indebted 
to Mr. Maurice Peston of the London School of Economies for several 
important suggestions. Neither of them is at all to blame for the in- 
adequacies of the text. 


| 


PLAN OF CHAPTERS 


| Chap. 1 Preliminaries 


Chap. 2. Number Systems | 


a 


. pe | ἐξὸν 
Chap.3 Polynomials Chap.8 Geometries 


: | Chap.9 Limits Ἷ 
and Continuity 


| Chap. 4 Sets | 


Chap. τὸ Calculus | 


Chap. 6 Groups 
; and Fields _ 


“Chap. 7 Relations | 7 
and Functions © 


Chap. 13 ‘Linear Chap. 12 Elementary 
| Algebra Functions 


Mainly Algebra Mainly Analysis 


Vill 


CHAPTER 


CONTENTS 


1. PRELIMINARIES 


1.1. 
1,2, 
1.3. 
1,4. 
1.5. 
1.6. 
1.7. 
1,8. 
1.9. 


The Traditional Subjects of Mathematics 
The Axiomatic Approach 

Elementary Algebra in Terms of Sets 

Some Examples 

Variables, Constants and Parameters 
Unresolved Problems in Elementary Algebra 
Notation 

References 

Exercises 


9, NUMBER SysTEMSs 


2.1. 
2.2, 
2.0. 
2.4. 
2.5. 
2.6. 
2.7. 
2.8. 
2.9. 


Rational Numbers 

The Operational Rules and Order Properties 
A Wider Field of Numbers 

Real Numbers 

Complex Numbers 

Integers 

Finite Sets of Integers 

The Binary System 

Exercises 


3. PoLYNOMIALS 


4, SETS 


A2 


3.1. 
3.2. 
3.3. 
3.4, 
3.5. 
3.6, 
3.7. 
3.8. 
3.9, 


4.1. 
4.2. 
4.3. 
4.4, 
4,5, 


The Fundamental Theorem of Arithmetic 
Gaussian Integers 

Polynomials 

Rational Fractions 

Polynomial Functions 

Roots of Polynomial Equations 

The Fundamental Theorem of Algebra 
The nth Roots of Unity 

Exercises 


The Basic Concept of a Set 
Operations on Sets 
The Operational Rules for Sets 
Boolean Algebra 
Counting Sets 

= 


FAGE 


te 
— 21 


fo ti ci 4 
or 


Go 


97 


x CONTENTS 


4.6. Finite Sets 100 
4.7. Countably Infinite Sets 104 
4.8. Transfinite Arithmetic 107 
4.9. Exercises 111 
5. STATEMENTS AND PROBABILITY 
5.1. Statements 114 
5.2. Statements and Sets 118 
5.3. Necessary and Sufficient Conditions 120 
5.4. Probability | 123 
5.5. Probability Measure 126 
5.6. Properties of Probability Measure 129 
5.7. Examples of Probability Measure 133 
5.8. Finite Stochastic Processes 135 
5.9. Exercises 139 
6. Groups AND FreLips 
6.1. The Structure of a Set 142 
'6.2. Groups 143 
6.3. Transformations 150 
6.4. Groups of Transformations 153 
6.5. Fields 157 
6.6. Algebraic Numbers 160 
6.7. Ordered Fields 162 
6.8. Inequalities 164 
6.9. Exercises 167 
7. ReLations AND FUNCTIONS 
7.1. Relations 171 
7.2. Equivalence 173 
7.3. Functions and Mappings 177 
7.4. Isomorphism 18] 
7.5. Linear Transformations 186 
7.6. Conformal Transformations 189 
7.7. Order 19] 
7,8. Properties of Order 192 
7.9. Exercises : 195 
8. GEOMETRIES 
8.1. Various Geometries 199 
8.2, Metric Geometry and Vectors 201 
8.3. Vector Spaces 204 
8.4. Euclidean Space 207 
8.5. Non-Euclidean Spaces 211 
8.6. Co-ordinate Geometry ; 214 
8.7. Projective Geometry _ 290 
8.8. Homogeneous Co-ordinates 994 


8,9. Exercises 228 


CONTENTS 


9, Limrrs AND CONTINUITY 


9.1. 
9.2. 
9.3. 
9.4, 
9.5. 
9.6. 
. Continuity 
9.8. 
9.9. 


10. CALCULUS 


9.7 


10.1 


10.2, 
10.3. 
10.4. 
10.5. 


10.6. 
10.7. 
10.8. 
10.9. 


Functions of a Real Variable 
Algebraic and other Functions 
The Algebra of Functions 
Limits of Sequences 

The Limit Process 

Limits of Functions 


Properties of Limits and Continuity 
Exercises 


Some Examples 

Derivatives 

Operational Rules for Derivatives 

Areas 

Integrals 

Lhe Fundamental Theorem of the Calculus 
Integration in Practice 

Derivatives and Integrals as Oper: 
Exercises 


11, EXPANSIONS 


11.1. 
11.2. 
11.3. 
11.4. 
. Absolute and Conditional Convergence 
11.6. 
11,7. 
11,8. 
11.9. 


11.5 


Taylor’s Series 

Maximum and Minimum Values 
Convergence of Series 

Series of Positive Terms 


Power Series 

Expansions of Functions 
Properties of Expansions 
Exercises 


12. ELEMENTARY FUNCTIONS 


12.1. 
12.2. 
12.3. 
12.4. 
12.5. 
12.6. 
12.7. 
12.8. 
12.9. 


Defining New Functions 
The Exponential Function 
The Logarithmic Function 
Power Functions 

Circular Functions 
Complex Exponents 
Trigonometric Functions 
Summary of Results 
Exercises 


13. Liyzar ALGEBRA 


18.1. 
13.2. 
13.3, 


The Basis of Linear Algebra 
The Structure of Vector Spaces 
Linear ‘Transformations and Linear Equations 


ΓῚ 


xu CONTENTS 


13.4. Matrices 376 
13.5. Operational Rules for Matrices 379 
13.6. Square Matrices 384 
13.7. The Rank of a Matrix 390 
ti 13.8. Solution of Linear Equations 393 
13.9. Exercises 398 
| 14, LiInEaR SYSTEMS 
14.1. Linear Algebraic Systems 403 
14.2. Linear Differential Equations 406 
14.3. Solution of Linear Differential Equations 410 
14.4. Oscillatory Movements 415 
14.5. The Use of the Operator D 419 
14.6. Linear Difference Equations 423 
' 14.7. Laplace Transforms 428 
| 14.8. Linear Models 432 
14.9. Exercises 439 
15. Some Format DEVELOPMENT 
15.1. From Integers to Real Numbers 443 
15.2. Polynomials: the Fundamental Theorem of Algebra 450 
15.3. Sets, Groups, Fields and Vector Spaces 455 
15.4. Limits and Continuity 462 
15.5. Integrals: the Fundamental Theorem of the Caleulus 467 
15.6. Absolute and Uniform Convergence 471 
15.7. Exponential and Logarithmic Functions 474 
15.8. Circular Functions 476 
15.9. Linear Algebra 480 
Appendix FormuLAE or ELEMENTARY ALGEBRA AND TRIGONOMETRY 
A.l. Powers and Exponents 489 
A.3. Logarithms 490 
A.3. Roots of Polynomial Equations 492 
A.4. Solution of Two Linear Equations 494 
A.5, Completing the Square 495 
A.6. Clearing the Denominator 496 
A.7. Trigonometric Ratios 497 
A.8. Triangles 499 
A.9. Cartesian and Polar Co-ordinates 500 
IEXERCISES: SOLUTIONS 502 
INDEX 507 


Note: * indicates exercises which are either difficult or involve new developments 
rather than illustrations of the text. 


CHAPTER 1 


PRELIMINARIES 


‘It might be as well to recall the professor who insisted that the essence of good 

teaching was always to tell the truth, but the whole truth only when students 

were mature enough to receive it.’ Report of the Commission on Mathematics 
(College Entrance Examination Board, N.Y., 1959), Appendices, p. 64. 


1.1. The traditional subjects of mathematics. In the teaching of 
mathematics in schools, it is customary to pass from very elementary 
work in arithmetic and mensuration to separate treatments of 
algebra and geometry, a little trigonometry being thrown in for good 
measure. Courses for more specialist students then develop into 
analysis, but there are still several compartments with separate 
labels: calculus, co-ordinate geometry, differential equations, and 
so forth. 

To a schoolboy taught in this way, algebra must appear as an 
extension of arithmetic in which x stands for ‘an unknown number’ 
and in which more-or-less realistic problems are posed to provide 
exercises in ‘finding xz’. On the other hand, geometry is a drill in 
logical argument from an axiomatic basis through a long series of 
theorems,‘ in the well-known Euclidean sequence, embellished by 
exercises in the form of riders on particular theorems. There may be 
some contact with reality at various stages, e.g. in dealing with 
volumes of solids or with trigonometric aspects of surveying. 

It can be agreed that the ‘whole truth’ is not presentable at this 
stage in mathematical teaching; even so, is this the ‘truth’ about 
mathematics? It may be claimed that the traditional approach has 
worked well enough and that it has a certain logical convenience. 
But it is surely worth inspection. 

At the practical level, algebra deals with the numerical aspects of 
things, and geometry with configurations in space, the two requiring 
different treatment. We are, however, somewhat shaken on dis- 
covering that, since geometry deals largely with distances, angles, 
- areas and other measurements, the subject is numerical and wide 


2 PRELIMINARIES [1 


open to algebraic treatment. We become almost schizophrenic in 
swinging from the abstract logical argument of ‘pure geometry’ to 
the algebra of ‘co-ordinate geometry’. A good example of pure logic 
is to be found in projective geometry with its applications to per- 
spective, to the deduction of properties of things from their shadows. 
But, when we come to investigate the properties of lines, circles, 
parabolas and other curves, we find it much easier to proceed in 
algebraic terms. 

On teaching grounds, it may be said that mathematicians need to 
learn two things: to argue with strict logic from premises to con- 
clusions ; to acquire the right tricks for solving all kinds of problems 
as they arise. Algebra, as customarily taught, is a pretty good exercise 
in manipulation; geometry, on the Euclidean model, provides the 
necessary counter-weight, a training in logical reasoning from a 
postulational basis. Drop Euclid from the school curriculum and the 
teaching of mathematics becomes a matter of opening up a series of 
boxes of tricks; there would be none of the discipline of the axiomatic 
approach. | 

There is a flaw in the argument. Apart from minor difficulties, 
Euclidean geometry has one devastating defect as a complete and 
consistent axiomatic treatment. There is no postulate on the order 
of points in space, nothing to distinguish whether or not one point is 
between two other points. The difficulty is illustrated by the well- 
known ‘proof’ that every triangle is isosceles. In the triangle ABC, 
let AP be the bisector of the angle BAC and DQ the perpendicular 
bisector of the side BC, meeting in O, as in (i) of Fig. 1.1. Drop 
perpendiculars O# on the side AC and OF on AB. By this construc- 
tion, OBD and OCD are congruent right-angled triangles and so are 
OAF and OAE. Hence: 


OB=O0C; OF=OEH; AF=AE. 
From the first two of these, OBF and OCE are congruent right- 
angled triangles so that 


FB=EC 
and AF=AE 
already established. Adding: 

AB=AC 


and the triangle A BC is isosceles. 


C 

D ΔῈ 

\ 

O \ 
P'Q 
A 

D C 
PNQ 


The fallacy is in the last line. Can we add as indicated? There are 
three general possibilities and one special (or degenerate) case to 
consider, all illustrated in the figures: 

(i) F is between A and B and EF is between A and C. 


Then: AB=AF+FB=AE+HEC=AC. 


(ii) F is not between A and B and £ is not between A and C. 
Suppose F is beyond B and EH beyond C as illustrated. 


Then: AB=AF~FB=AE-EC=AC. 


The same result follows in the only other possible situation, i.e. # 
and EH beyond A. 
(iii) F is not between A and B and £# is between A and C (or con- 
versely). 
In the case illustrated : 


AB=AF-FB=AE-EC but AC=AE+EC 


and so ABAC. A similar result holds in the converse case where F 
is between A and B and £ is not between A and C. 
(iv) AP and DQ are parallel or coincident, in which case O does not 
exist and the whole construction breaks down. 
Which of these possibilities can hold? In fact, it can only be (111) or 
(iv). If the triangle ABC is isosceles, then AP and DQ coincide, case 
(iv). Lf it is not isosceles, then the only possibility is (111). But it is not 


4 PRELIMINARIES . [1 


possible to prove this by Euclidean geometry in default of any concept 
of the order of points on a line. | 

1.2. The axiomatic approach. Faced with this situation, we may try 
simply to plug the gap in Euclidean geometry. But is there not a 
similar problem of order for numbers and may it not be better to pay 
attention to the postulational basis of the number system and, 
indeed, of the whole of algebra? Why need we confine the axiomatic 
approach, even in elementary teaching, to the subject of geometry? 

Mathematics is not a closed book. It has been growing vigorously 
over the centuries and there is still plenty of room left for develop- 
ment. In the past 50 years, it has become increasingly clear that the 
axiomatic basis of mathematics is not just an exercise for academic 
mathematicians and logicians. It serves to simplify, to unify and to 
generalise, to cast an illuminating light on the whole structure of 
mathematics and its applications. At the same time, mathematical 
formulations in the sciences — for example in such different fields as 
economics and physics — have become more abstract and sophisti- 
cated. It is now quite essential that the assumptions of mathematical 
models should be made precise and explicit, that the principles of 
model-building should be made clear. 

The object of the following chapters is to expose the simple and 
uniform foundations of mathematics. In this we must be sure we have 
the truth and nothing but the truth — even if the whole truth some- 
times eludes us if we are not to get bogged down in over-elaborate 
detail. It is apparent that, for appropriate application of mathematics 
as well as for our general satisfaction, it pays us to be careful and 
precise in formulation.* A word of warning: we can see the simplicity 
(as well as the uniformity and generality) of a sound axiomatic 
development once it is written down, but it is not easy to achieve. 
Advances are made by a nice combination of intuition and logic. 
Intution suggests what is to be established and the lines on which to 
lay out proofs. The best formulation, i.e. the most strict and 
economical, and the most revealing, is a matter for logical thought 
and experiment. The refinement and exposition of the concepts here 
described are the outcomes of the work of countless brilliant mathe- 


* Hardy remarks in the preface to the third edition (1921) of his classic Pure 
Mathematics: ‘It is curious to note how the direction of the criticisms I have had to 
meet has changed. I was too meticulous and pedantic for my pupils of fifteen years 
ago: 1 am altogether too popular for the Trinity scholar of today.’ 


2] PRELIMINARIES 5 


maticians over the decades, indeed over the centuries. In approaching 
them we must be prepared to exercise our logical powers, but we 
should also leave plenty of scope for intuition. 

We should forget the particular applications of mathematics with 
which we may be familiar and hence the particular division of subject 
matter into algebra, geometry, trigonometry, and so on. We are 
concerned with ideas which run through all mathematics. In tackling 
a problem, we may be used to starting: ‘let zx be any number’ satisfy- 
ing this or that condition; or ‘let P be any point’ on this or that 
locus. The basic idea here is: ‘let ᾧ or P be any member of a set X’, 
leading to a study of the set X, its specification, structure and 
properties. We must be prepared to have X as a set of entities of any 
kind. It may be a set of numbers, e.g. all positive integers or all real 
numbers. It may be a set of points in a plane, or the corresponding 
set of number pairs (the co-ordinates of a point). Or it may be a set 
of quite different entities, e.g. a set of operators or transformations. 
The vital matter is the structure of the set: how are the members 
related and how can they be combined one with another? It is a 
pleasant surprise to discover that sets of very different entities can 
have much the same structure. 

There are other reasons for giving up the usual division of mathe- 
matics into subjects. Apart from the fact that we can sweep the lot 
᾿ς together in an axiomatic approach, we can say that the compart- 
ments have never proved to be water-tight. Some simple equations of 
algebra, e.g. z?-2=0 or 22+2=0, cannot be solved within the 
system of rational numbers which characterises algebra; they need 
the real and complex number system of analysis. Further, algebraic 
expressions are often treated graphically, i.e. in terms of the geometric 
properties of curves. Conversely, much of geometry is best handled 
in algebraic form and with the aid of calculus. Such a simple geometric 
idea as the circumference or area of a circle involves a number 7 of a 
most sophisticated kind, a number which is not only ‘irrational’ but 
also ‘transcendental’, being the root of no polynomial equation. 
Again, the functions we first meet as trigonometric ratios appear in 
the most unlikely quarters, as sums of series and in the disguise of 
an area or an integral. And so on, back and forth between one subject 
and another. 

Naturally, classification is one of the objects, and the delights, of 


6 PRELIMINARIES [1 


mathematics. It just happens that classification by subject matter 
of application is not very useful in developing mathematical ideas. 
A better classification by far is the following. There is finite mathe- 
matics, dealing with finite sets, with stochastic processes and Markov 
chains, with vectors and matrices and with ‘linear algebra’ generally. 
There is the mathematics of the countably infinite, associated with 
the number systems of the integers and rationals, with particular 
reference to series and sequences. Then there is the most powerful 
mathematics of all: mathematical analysis, involving the continuum 
of real (and complex) numbers, leading through the idea of limits to 
the infinitesimal calculus.* 

At this point, the geometers may raise a howl. The ideas and 
methods of geometry seem to be swallowed up by algebra and 
analysis. And so they are, once we think of a locus as a set of points, 
and choose to represent points in a plane by number pairs. On the 
other hand, we do find it convenient and helpful to keep the link 
between points and numbers always in mind. Geometric properties 
are translated into algebraic terms and, conversely, algebraic de- 
velopments are illustrated visually in graphical or geometric terms. 
We can cater equally for those with an algebraic and those with a 
geometric turn of mind. 

At all stages of a mathematical development, our constant aim must 
be to be both as precise and as general as we can. We need to be most 
clear on what we are doing in an axiomatic approach when we get 
down to fundamentals, when we start on such a major undertaking 
as the construction of the theory of sets or of the real number system. 
Clearly, no matter how deep we go in mathematics, we must have 
something to build upon, some straw for our brick-making. The basic 
discipline we must assume is the system of logical operations, framed 
in appropriate language and symbols. This is the logic of statements 
involving negation, disjunction, conjunction and implication, ex- 
pressed by the words ‘not’, ‘or’, ‘and’ and ‘if...then...’ and 
written when convenient (as in 5.1 below) in terms of the symbols 
~, V, A and—. On the axiomatic method, we proceed to add some- 


* ‘Calculus’ is derived from the Latin: calx =stone. Modified by the diminutive 
‘ulus’, it means ‘small stone’ as used in reckoning on an abacus. So calculus, or 
calculation, is any kind of reckoning. The adjectival qualification ‘infinitesimal’ is 
needed to indicate calculation with infinitesimal or continuous variation. Usually the 
‘infinitesimal calculus’ is referred to simply as the ‘calculus’. 


2, 8] PRELIMINARIES 7 


thing new, something specific to the development in hand — and to 
be quite precise in formulating what is new. We first introduce 
certain primitive (undefined) concepts or relations and we follow 
these by defining other concepts or relations. Next, the concepts or 
relations are made subject to a system of axioms in the form of 
statements spelled out precisely to describe consistently and com- 
pletely what properties we wish our new concepts to have. Only 
then are we ready to embark on the process of establishing further 
and consequent statements, i.e. to prove a sequence of theorems. 
The mathematical development, axiomatised in such a way, becomes 
a formal system or model.* 

The ideas and methods involved in these chapters are essentially 
quite simple. It is true that they must be pursued with a certain 
determination, particularly when topics arise which are traditionally 
classified as ‘advanced’. For example, having developed real numbers, 
we can go on quite naturally to complex numbers. It would be fatal 
to the whole approach to think of them as more difficult or as “com- 
plex’; the label is indeed quite misleading. Looking back at the end, 
we shall see only two tough stretches — the definition of a real 
number and the consequent development of the idea of a limit — and 
we shall find only three or four results which are really difficult to 
establish. One is the fundamental theorem of algebra that a poly- 
nomial equation of the nth degree has precisely n roots. Another is 
the fundamental theorem of the calculus which establishes the inverse 
relationship between derivation and integration, and hence between 
the two apparently different applied processes of evaluating rates of 
change and areas. 


1.3. Elementary algebra in terms of sets. As a preliminary canter 
over the field, we can attempt a re-interpretation of the elementary 
algebra of school text-books with emphasis on the basic concept of a 
set. The object is to make clear the logical foundation of algebra and 
to dispose of the notion that, in algebra as elsewhere in mathematics, 
the main thing is to acquire all the ‘tricks of the trade’. We cannot 
dispense with tricks in mathematical work and some of the simpler 


* A formal system developed strictly on the axiomatic method would be highly 
symbolised and a rather arid affair. Here we compromise in order to provide a general 
exposition which explains what is going on. It might be described as ‘informal 
axiomatics’. See Church: Introduction to Mathematical Logic (Princeton, 1956). 


8 PRELIMINARIES Ἢ 


ones are set out in the Appendix. A part of the skill of a practising 
mathematician lies in the box of tricks he has at his disposal; he 
needs to be expert in producing the right trick to solve the right 
problem. But it is much more important that he keeps always in 
mind the fact that everything he does is part of a broad and basic 
pattern, that the tricks he employs are not arbitrary and unrelated. 

The way in which a ‘formula’ arises in mathematics can be shown 
by a simple example. Consider the assertion: the month of January 
has 31 days. Convert it into an open statement: the month of x has 
31 days. What is x? We must be quite explicit. Here, we need to say 
that x stands for any one of the twelve months of the year. More pre- 
cisely : x is any member of the set X comprising the twelve months. 
The statement is true of some members of X and false for others. 
The set X is the replacement set of the symbol x; we are entitled, in 
the statement, to replace x by any member of X we care to pick. 
More technically: the set X is the domain of the variable x. It is 
essential to know exactly what domain every variable has. In 
practice, the domain is often left to be understood from the context; 
it is always a good discipline to bring it out and to make it quite 
explicit. 

The statement can be developed further : the month of x has y days. 
The domain X of the variable zx is still the set of twelve months. 
What is y? Clearly y is another variable, but dependent in some way 
on the original variable x. It must also be a member of an appropriate 
set. Here, if leap years are ignored, y is a positive integer, one of a set 
of three {28, 30, 31}. Replace x by a particular member of its domain, 
and y is uniquely determined: either 28, or 30, or 31. The dependent 
variable y corresponds to a specific set Y, here {28, 30, 31}, and Y is 
called the range of y. 

We can now appreciate what we have done in generalising our 
original assertion into the statement: y is the number of days in the 
month x. We have specified two sets: X comprising the twelve 
months of the year and Y consisting of three integers. The sets are 
related and the statement is a specification of how the linking is 
achieved. It gives a rule or a formula for going from a member of X 
to the corresponding member of Y. 

In elementary algebra, we confine ourselves, rather strictly, to 
sets of numbers. We make generalised statements which we condense 


3, 4] PRELIMINARIES | 9 


into formulae, of the well-known ‘algebraic’ type, involving certain 
numerical variables. Indeed, we tend to rush too quickly to the 
formula — too quickly to keep in mind precisely what we are doing. 
Too much emphasis is on the formula, too little on the sets being — 
related. It is true that the formula is very interesting in itself, taking 
a variety of forms as illustrated below. The need remains, however, 
to specify what sets we deal with, before the precise nature of the 
formula connecting them can be appreciated. 


1.4. Some examples 

(i) Let x be any rational number, an integer or fraction (positive, 
negative or zero). Let y be the number obtained by taking twice the 
square of x, by subtracting x and then by subtracting 3. This rigma- 
role is condensed to the formula: y = 2a? — x -- ὃ. It is an expression of 
y in terms of x. Before we start to play around algebraically with 
the expression, we should pause to consider what sets we have. The 
domain X of the variable x is the set of all rational numbers, as 
opposed (for example) to the narrower set of integers or the wider 
one of real numbers. The values of y make up another set Y. The 
expression shows that y is always a rational number. What we do not 
yet know, and must find out, is whether the set Y comprises all or 
only some of the whole set of rationals. When this has been deter- 
mined, the expression can be interpreted precisely as a formula for 


relating the set X to the set Y, for picking a member of the domain ᾿ 


X and writing down the corresponding member of the range Y. 

The domain X is the replacement set of the variable x. Pick any 
rational x from X and obtain the value of the expression by substitu- 
tion of this x. For example, y= 2 (4)? -- (4) -3=4-—4-—3=-—3 when 
x=%. In this way, a table of corresponding y’s for selected 2x’s is 
built up: 


a|-2 -1 -! oO +2 #4 122 8 
y| 7 +O -2 -3 -% -3 -2 0 8 12 


_The table can be filled out by insertion of more and more entries. To 
study the nature of the expression y = 2. — x — ὃ, we find it helpful 
to draw a graph from the table of corresponding x’s and y’s and to 
observe the shape of the ‘curve’ shown. To plot the graph, take two 


10 PRELIMINARIES πο Ἢ 


axes Ox and Oy and an appropriate scale of measurement on each. 
It is usual to draw Ox horizontal and Oy vertical (Fig. 1.4). Given 
an x and associated y from the table, plot a point by going a distance 
x from O along Oz (to the right for positive x, to the left for negative 
z) and then by going a distance y parallel to Oy (upwards for positive 
y, downwards for negative y). The set of 
paired values of x and y in the table is then 
translated into a set of points on the graph. 
In this case, from Fig. 1.4, it appears that 
the plotted points lie on a smooth curve 
(confirmed later by theory) and a free-hand 
sketch of it can be included in the graph. 

We now have our answer to the question 
abouttherange Y oftheexpression. Thegraph 
indicates that, as x is allotted various rational 
values, so y takes all rational values except those less than a certain 
minimum. The smallest y is — 25/8 when a is 1/4. We seek to establish 
this algebraically and an appropriate way is easily found (Appendix 
A.5). So the expression y = 2x? — x — 3 relates the set X of all rationals 
(the domain of x) to the set Y of all rationals not less than -- 25/8 
(the range of y). 

(11) Consider the statement: think of a number (a positive integer), 
double it, add 4, divide the result by 2, subtract the number first 
thought of, and the answer is always 2. Write x for the optional 
number, with domain X as the set of all positive integers. The state- 
ment reduces to the formula: 

$(2a+4)-x=2 for all x in the set X. 


The formula here is an identity, true for all 2z’s of the specified 
domain.* One way of expressing this fact is to write the expression 
y = 3 (2% +4) --α defined for all x of X, and to show that the range of 
y is a set consisting of one item only, the integer 2. 

As a common example of the use of identities, let x be any rational 
and consider the sequence of statements: 


Fia. 1.4 


y =2%* -x-3 
2x? ~x —-3=(%+1) (2a — 3) 
and so: y = (x + 1) (2x -- 8). 


* In an identity the sign = must be interpreted as ‘identically equal’; it is some- 
times written as =. 


4΄ 07s PRELIMINARIES 11 


The second of these is an identity. The other two are equivalent ways 
of writing the expression y. This is the familiar process of factorising. 

(iii) Let x be any rational number and consider the statement that 
twice the square of x is equal to x plus 3. This may be true for some — 
2 in the domain X of all rationals, but it is certainly false for others. 
The statement can be written as the equation: 


Q942=274+3 or 2x?-x7-3=0 


to be considered over the domain X of all rationals. 

Write the expression y= 2x? —- 2-3 and reverse the procedure of 
(i). Instead of substituting various x’s and writing corresponding y's, 
we now ask: given that y is zero, what a’s in the domain X will do? 
The graphical approach works again. From Fig. 1.4, we locate two 
x’s which will do, the values -- 1 and 3/2. In seeking to establish this 
algebraically, we find the factorisation of (ii) turns the trick: since 
y =(x +1) (2a -- 83), it follows that y=0 only when a= —1 or x=3/2. 

The achievement here is to start with the set X of all rationals and 
to narrow it down to a set of two rationals {-- 1, 3/2} for each of 
which the equation 2x? -- ὦ —3=0 is valid. This narrower set of valid 
a’s is the solution set of the equation. In writing x= -- 1 or x=3/2, we 
have ‘solved’ the equation 2.3 --α —-3=0. 

(iv) Vary the statement to read: twice the square of ὦ is less than 
x plus 3. Again this is the kind of statement true for some and false 
for other x. The formula is now an inequality: 


Qa2<a4+3 or 2χ.3-χ--ὃ -Ὁ 


to be considered over the domain X of all rationals. 

The same question arises: what z’s in the domain X will do? In 
the graphical approach (Fig. 1.4), we seek those 2’s which make 
y = 2x --α -- ὃ negative and for which the curve falls below the axis 
Ox. The answer is apparent: all x’s in the interval between —1 and 
3/2. The algebraic proof is to be supplied: since y=(x + 1) (2x -- 3), 
it follows that x<-—1 gives y>0, -- ΙἸ «ὦ «82 gives y<0, and 
5» 82 gives y>0 again. The set of valid χ᾽ is the solution set of the 
inequality. Once again, the formula (inequality here) enables us to 
cut down the set X of all rationals to a smaller set. In this case, the 
(smaller) solution set happens to be the set of all rationals x such 
that —1<xz<3/2. We say that the inequality 25: --α -- ὃ «Ὁ holds 
for rational x in the interval —1<a<3/2. 


12 PRELIMINARIES [1 


1.5. Variables, constants and parameters. We have now a good idea 
of what we mean by a variable. A set X of items (usually numbers in 
algebra) is specified ; a variable x is any member of the set X , and X 
is the replacement set or domain of the variable. A variable is a 
‘place-holder’ for any member of its domain. It is not good enough to 
describe a variable as an ‘unknown’. A variable is a member of 8 
precisely specified set (its domain) and we know exactly what values 
we allow it to have. 

The difficulty arises because of loose thinking about the solution of 
equations or inequalities. Suppose we are given some expression y in 
the variable x with domain X. We may seek those particular x’s 
which make y=0 (an equation) or which give y <0 (an inequality). 
Some ’s will do, others not. The result is the solution set, a narrowing 
down of the domain X to a smaller set. For example, if X is the set 
of all rationals, the solution set of the equation 2x2 -- αὶ -- 3-- is the 
set { — 1, 3/2} of two rationals and that of the inequality 22? -- x -- 3 <0 
is the set of rationals x such that -- 1 <a <3/2. It is not an adequate 
description of this process to say that the variable x is an ‘unknown’ 
in the equation 2x%—z—3=0 and then to ‘find’ it as either —1 or 
3/2. The process in the use of the formula (equation and inequality 
alike) is one of cutting down the domain X to some smaller set, the 
solution set. 

There is some lack of agreement on the terminology in the solution 
of equations. We know that, if we substitute either x= — 1 or r=3 /2 
in the expression y = 2%? -- xz — 3, then we get y=0. How can we de- 
scribe this most conveniently? The value —1 or 3/2 may be called a 
‘solution’ or a ‘root’ of the equation 2x2-2—3=0. The same value 
may be called a ‘root’ or a ‘zero’ of the polynomial y =2z? —x -- 3. 
The following is the terminology adopted here. The value x= -- 
(or x =3/2) is a zero of the expression y = 2x? —x —3; on substitution 
we find y=0. The value x= — 1 (or x=3/2) is a root of the equation 
2x* — x —3=0; itis a value for which the equation is valid. The complete 
set, here { — 1, 3/2}, of all roots is the solution set of the equation. The 
term solution is reserved for the process of finding the roots. In the 
illustration given, the expression happens to be a polynomial, and 
the equation a polynomial equation. The terms apply just as well to 
any expression y=/(x) in a variable x with domain X comprising 
real numbers. It is useful to have two terms (zero and root) for what 


~ 


Ty Spare νὰ σός. τών τς 


δ] ᾿ς PRELIMINARIES 13 


might appear to be the same thing. There is, ἴῃ fact, a distinction 
which is worth making. An expression has zeros; an equation has — 
roots. The link is that a zero of the expression y =f () is also a root of 
the equation f(x)=—0; each is a value of x which, on substitution, 
makes f(x) vanish. The term to use depends on whether we have in 
mind the expression or the equation.* 

Interest in algebra is concentrated on relations between variables, 
particularly when there is a dependent variable y with range Y given 
uniquely in terms of a variable z over a domain X.'The dependence is 
shown by some algebraic expression such as y = 2”? — a — 3. There are 
the two things to keep in mind. One is that the dependence is a 
linking of two sets, the domain X and the range Y. The other is that 
the dependence is expressed by means of some formula or other, and 
that there is a great variety of such formulae. 

In a specified algebraic expression, in addition to the variable z, 
there are certain particular numbers called constants, all combined 
by the simple processes of algebra. If the expression is y = 22? — x — 3, 
the constants are 2, —1 and —3 and the only algebraic processes are 
addition and multiplication: y=2xa#xx+(-—1)xx+(-3). How- 
ever, the notation is flexible enough to accommodate more general 
cases and to lead to more powerful methods of analysis. The ex- 
pression 2x2 — x — 3 is recognised as just one instance of a whole class 
of quadratic polynomials with rational constants as coefficients. 
In this case the coefficients happen to be 2, — 1, and —3. There are 
many other instances, for example x?-— 47+} with coefficients 1, 

—4and 1. We cannot possibly list them all; we would like a notation 
enabling us to speak of any quadratic polynomial with rational 
coefficients. This is a simple matter, once we allow for coefficients 
which are parameters.t We write a general quadratic polynomial 
as y=axz*+ba+c, where x is the variable and where a, ὃ and ¢ are 
* Different terms are used by various writers. Two things are clear enough: a 
polynomial equation has roots, a function of a complex variable has zeros. The lack of 
agreement arises between these extremes. It is to be noticed that this use of ‘root’ is an 
extension of a simpler usage. The ‘cube root’ of a real number a is the single real value 
which, when raised to the third power, gives a. It is one root of the equation x* =a (in 
the present sense). There are also two other roots which are conjugate complex. It 
might be preferable to reserve the term ‘root’ for the ordinary concept of the nth root 
of a positive real number a, i.e. for the single positive real value satisfying «" =a. We 
could then use ‘zero’ both for functions and for equations. But it is established 


terminology to speak of the roots of an equation. 
t ‘Parameter’ is derived from the Greek: para = beyond, metria = measuring. 


14 PRELIMINARIES [1 


parameters. From one point of view, the parameters are constants, 
but not particularised. We then think of y=az?+bx+c as one 
quadratic without specifying which one. From another point of view, 
the parameters are variable, when we switch from one quadratic to 
another or consider a whole class of quadratics. 

An essential feature of the parametric form is that we must 
specify the replacement set of the parameters. The coefficients in 
y=—ax*+bx+c are drawn from some specific set, which may be the 
same as or different from the sets of values of x and y. For example, 
the domain X may be all rational numbers (and the range Y some 
other set of rationals) while the parameters a, b and c may also come 
from the set of rationals, or they may be drawn from the set of 
integers. Hence, we have the qualification: y=axz?+bz+c is the 
general quadratic with rational coefficients, with integral coefficients, 
or whatever may be specified. 

‘The parametric notation is one of great power. It enables us to 
deal with a class of expressions of the same general type, to establish 
properties valid for all expressions of the class. For example, it can 
be shown (Appendix A.5) that the class of quadratic polynomials 
y =ax* + bx +c with rational coefficients, and such that a>0, has the 
common property that each has a single minimum. The graph of each 
is of the form shown in Fig. 1.4. We can also locate the minimum; 
it is y= — (6? — 4ac)/4a attained where x= — }/2a. Again we may seek 
the roots of the quadratic equation ax? + bx +c=0, where a, ὃ and ὁ 
are rationals such that b?>4ac. We have our answer (Appendix A.3): 
there are two roots {—b+./(b? — 4ac)}/2a. We can always proceed 
from the general to the particular, by specifying the values of the 
parameters. For example, put a=2, b= —1 and c= —3, and we find 
that y = 22? — x — 3 has the smallest value — 25/8 when x=}, and that 
2x*—x-—3=0 has the two roots $(1+5), 1.6. —1 and 3/2. 


1.6. Unresolved problems in elementary algebra. It is perhaps inevi- 
table that the treatment of algebra at elementary levels glosses over 
certain difficulties of a rather troublesome nature. It is a good 
exercise to bring these difficulties out into the open, even if they 
cannot be overcome immediately. Several of them are considered here. 

(i) Consider the expression y=a* where z is a variable and a is a 
parameter. This is a power of a with a variable exponent x. What is 


6] PRELIMINARIES | 15 


an appropriate domain for x and what replacement set can be used 
for a? In the customary usage (Appendix A.1), αὐ is a multiplied by 
itself x times when zx is a positive integer, it stands for a gth root 
when zx is a positive fraction p/q; and it is the reciprocal of the corre- 
sponding positive power when x is negative (integral or fractional). 
This is as far as elementary algebra goes. No meaning is attached to 
irrational powers such as οὐ or a". Hence, in writing y=a*, the 
domain of x cannot be wider than the set of rationals. Something 
more is needed, outside the scope of elementary algebra, if x is to be 
taken over the domain of all real numbers. 

On the other hand, the parameter a can be drawn from the set of 
real numbers, rational or irrational: 27 is not defined but 7? certainly 
is. The difficulty is that a cannot be negative for certain values of 
x in a*; for example, ( — 64)! —.¢/( -- 64) = — 4, but ( — 64)*=,/( — 64) 
cannot be written. Further, a=0 is possible in αὐ for some x but not 
for others; for example 0?=0? =0, but no meaning is attached either 
to 0-? or to 0-? since we cannot handle zero in the denominator. 
The most we can do, in elementary algebra, if we take y =a* over the 
domain of all rational z, is to limit a to the set of positive real numbers. 

It is to be noticed, in particular, that: 


at = ,/a = positive square root of positive a. 


No square root is written of a negative value. But if a is positive, then 
there is a positive square root ,/a, and also another or negative square 
root to be written — /a. So, if x?=a, then x=,/a and x= — /a are 
the two roots. 

(ii) The consequence is that the concept of common logarithms, as 
used in ordinary numerical work, is seriously undermined (Appendix 
A.2). The notation of a logarithm as an inverse power looks innocent 
enough. In writing y=log,,x we mean that x=10". A common 
logarithm is the exponent in a power of 10. We have just seen that, 
as far as elementary algebra goes, we can write 10” only when y is 
rational. There is no difficulty with some logarithms; for example, 
log,) 0-01 = — 2 since 10-?=0-01, and log,, ./10 =0:5 since 105 = ,/10. 
But what of log,) 2 and many others? Tables of common logarithms 
give us ἃ value of log, 2, i.e. 0-3010 to four decimal places, 0-30103 
to five decimal places, and so on. The implication is that 


logi) 2=0-301083 ... , 


16 PRELIMINARIES [1 


an irrational number to be approximated to as many decimal places 
as we wish. ‘This would mean that 1039103... —2, And this is something 
we cannot write, at least in eleme itary algebra.* To justify com- 
pletely the practical use of tables of logarithms requires more than 
elementary algebra can provide. Ὁ 
(iii) Consider the quadratic equation axz?+bxz+c=0 with co- 
efficients which are real numbers (a0). For the moment, leave open 
what domain we have in mind for x. The solution of the equation 
leads to: 
e={—bt {(δ3 —4ac)}/2a. ......{.ννοννννννννννον .(1) 


We seem to have the very convenient result that every quadratic 
equation has two roots, one given by the + sign and the other by the 
- sign in (1). This is, however, more than the formula (1) can support. 
If the domain of z is the set of all rationals, then we recognise only 
rational roots. Hence, if b?-4ac is a perfect square, the quadratic 
has roots, given as rationals by (1). Otherwise, there are no roots. 
So 22? -- x -3=0 does have two roots x -- 5 (1: δ) -Ξ- — 1 or 3/2 by (1); 
but x? — $2 — 4 =0 has no roots since we cannot recognise the irrational 
values x =}(1+ ,/5) given by (1). There is one odd case: if 6? -- 4ac =0, 
(1) gives x= —6/2a and the equation has a rational root, but only 
one and not two. The difficulty is overcome by agreeing that there 
are two rational roots which happen to coincide when 6? -- 4ac=0. 
We can do better if we allow the domain of x to be the set of all 
real numbers, rational or irrational. Then (1) gives two real roots 
provided that 6b?—-4ac>0, and two real roots which happen to 
coincide if 6? -- 4ac=0. The equation x? — ἐν -}=0 is now accommo- 
dated, with roots x =}(1+ /5). But we are still not able to say that 
every quadratic equation with real coefficients has two roots. The 
case where b?-4ac<0 defeats us. For example, for the equation 
x? —4%+4=0, (1) gives x=}(1+./-—3) which we cannot recognise. 
We cannot wriggle out by saying that the roots are real if b? -- 4ac>0 
and ‘imaginary’ if 6?—4ac<0; we must know what ‘imaginary’ 
numbers are and extend the domain of x to include them. To justify 
the position, that every quadratic equation has two roots, the 


* An interpretation is possible. The entry 0-30103 (to five decimal places) against 
logy) 2 can be interpreted: if log,, ὦ =0-30103, then 2 = 1099913 exactly and x is close 
to 2. It is the 2 which is an approximation; log,, 2 has no meaning but log,, for 
certain x close to 2 does have meaning. 


6] PRELIMINARIES 17 


domain of x must be extended further and this requires more than 
elementary algebra. 
(iv) Consider a pair of linear equations in two variables: 
ax+by+c,=0 and a x+by+c,=0 

where we take both real coefficients and domains of real numbers for 
the two variables x and y. Algebraic manipulation (Appendix A.4) 
gives: 

gp = Oita Os BG. ἄξει 9 τὐ ον ο ἀν έρθοῦὶ (2) 

a,b, -- dpb, 
We seem to have a very convenient result: every pair of linear 
equations has a unique solution, the real values of x and y being 
given by (2). There is no difficulty, as with the quadratic equation 
of (iii), about ‘imaginary’ values. But there is a difficulty. If 
a,b, —a,b,=40, the algebra is correct and (2) always gives real values 
for x and y. But what if a,b,—a,b,—0? The results (2) are then 
without meaning, since each denominator is zero. In fact, the algebra 
leading to (2) has gone astray and we are left with (apparently) no 
solution. 
A graphical approach helps here. If we plot the relation 
a,x+b,y+c,=0, 

as in Fig. 1.6, we get a line L,; similarly from ας + δὲν +c,=0 we get 
another line L,. The values of x and y given by the point P of inter- 
section of the two lines satisfy both relations, i.e. 
provide the solution of the pair of equations. 93) 
The case illustrated has x+y-—3=0 for L, and 
2x —3y+3=0 for L,. The point of intersection 
has co-ordinates = 1-2 and 7 = 1-8, the solution 
of the two equations, as can be checked from 
(2). The cases of failure are now evident. The 
lines L, and L, meet in a single point as long 
as they are distinct and not parallel; this is so Fia. 1.6 
when a,b, —a,b,40. If the lines are distinct and 
parallel, then there is no point of intersection and we can say that 
the two equations have no solution because they are inconsistent. 
If the lines coincide, then there are indefinitely many points of ‘inter- 
section’ and we can say that the two equations have an indeterminate 
solution because they are identical. 


18 PRELIMINARIES fl 


Hence, as far as elementary algebra is concerned, the result is that 
the two linear equations have a unique solution given by (2), provided 
that a,b, —a.b,~40. Cases where a,b,—a,b,=0 cannot be handled. 
These are often called degenerate cases; they need to be examined 
further. 


1.7. Notation. Mathematical exposition is greatly facilitated by a 
good notation. Conversely, a clumsy notation puts off the reader and 
tends to prevent the full exploitation of results. Notational difficulties. 
are twofold. There is an acute shortage of letters and other symbols 
for use and a notation must be extremely economical. There is the 
need to ensure that a notation is simple, concise and pleasing — and 
at the same time generally accepted and understood. Not all notations 
are economical in the use of letters, nor are they always generally 
adopted; different writers may well use various notations for the 
same thing. However, a considerable degree of uniformity does exist 
and it is usually wise to stick to what is generally in use. 

There are 26 letters in the Roman alphabet. The supply can be 
doubled by taking both small and capital letters and further 
increased by pressing the Greek alphabet into service: 


Small Capital Name Equivalent | Small Capital Name Equivalent 


α Α alpha a v N nu n 

B B beta b ἕξ Ξ xi x 

y Γ gamma _ = g (hard) ο O omicron 0 (short) 
ὃ 4 delta d π IT pi p 

ε E epsilon 6 (short) ρ Ρ rho r 

ζ Ζ zeta Ζ σ Σ sigma 8 

ἢ Η eta e (long) T T tau t 

6 Θ theta th υ Y upsilon ἃ 

ι I iota i φ Φ phi ph 

K K kappa k x xX chi ch (hard) 
A A lambda 1 ψ νυ psi ps 

μ Μ mu m ω Q omega οὐ (long) 


Certain letters or groups of letters are usually reserved for par- 
ticular purposes. The later letters are used for variables: x, y, z and 
sometimes wu, v, w; the Greek letters £, η, ζ are also employed for this 
purpose. For constants or parameters, it is usual to take early letters 
(a, ὃ, c or the Greek a, β, y) where the constant aspect is stressed ; and 


7) PRELIMINARIES 19 


to take middle letters (k, 1, m,n or the Greek «, A, μ, v) for a parametric 
interpretation. A few letters are kept almost entirely for particular 
purposes. The constants 7 =3-14159 ... and e =2-71828 ... are of such 
importance as almost to pre-empt these letters. Similarly Σ᾽ is re- 
served to denote summation. A natural notation for variable time is 
t or τ. In the calculus, ὦ and D are used for derivatives, ὃ and 4 for 
finite increments or differences. Almost invariably « denotes a small 
constant, and @ often indicates a proper fraction (between 0 and 1). 
General expressions or functions lay claim to f, g, Ff, G and the 
Greek ¢, 4. These reservations, however, are subject to exceptions. 
For example, g may stand for the constant of gravity and F for a 
field (and not for a function), and @ can be used for an angle (and not 
a proper fraction). 

These resources are still not enough. One way of stretching them 
further is to print in different types. Letters used as symbols are 
conveniently printed in italics, e.g. a and A, but different types are 
possible, e.g. a and A in bold. The difficulty here — and it prevents 
widespread use — is that what is possible in print is very difficult to 
convey in manuscript or typescript. Bold type can, however, be 
usefully introduced in such particular fields as matrix theory, and 
it is so used in Chapter 13 below. 

Non-literal symbols are employed to economise on letters, mainly 
for relations between or operations on entities denoted by letters 
Well-known examples are =, < and > for ‘equals’, ‘less than’ and 
‘greater than’ respectively. Variants on these are less familiar but 
very economical in use. So < means ‘less than or equal to’ and > 
‘greater than or equal to’. Further, 4 means ‘not equal to’; « and 
+ have similar negative interpretations. Other non-literal symbols 


are involved in 7}, (ἢ and | a | as defined below, and the symbol 


appears in an integral. However, it is not helpful to clarity to scatter 
around large numbers of such symbols. Hence, some operations are 
denoted, not by non-literal symbols, but by an abbreviated form of 
the name of the operation, e.g. Lim for ‘limit’ and Max for ‘maximum’, 

There is one further extension possible, and it is a very important 
one: the use of numerical or literal subscripts. This provides a fine 
example of how the joint needs of economy and precision are satisfied. 


Much of mathematics deals with ‘many’ — either a specified but 
B A.B.M. 


20 PRELIMINARIES [1 


large number, or more usually an unspecified number, of things. 
Consider a set of constants, say the coefficients of a polynomial. 
There may, perhaps, be 4 of them, as in a cubic; they can be written 
a, b, c and d. There may be an unspecified number of them, as in a 
polynomial of nth degree, and they may be written a, ὦ, c,... k. 
But this is both vague and wasteful. One letter, modified by sub- 
scripts, does much better: @,, a, d3, a, for four constants; a, dp, 
a, ... ας for n constants. The notation can then be condensed further, 
by using general subscripts (i and 2, or r and 8 are commonly adopted) 
and by indicating the values they take. So, for 4 and n constants: 
a, r=1,2,3,4 ; a, s=1, 2,3,...n. 

These are simple sequences. A further development suggests itself, 
to accommodate double (or higher) sequences. A double array of 
m xn constants can be denoted by a single letter and two subscripts: 


Qi, Uo Ay woe Ain 
Qo, Gee ag wee Aan 
Amt Ame Ams eee Qinn 


where the first subscript indicates the row and the second the column. 
The array can then be drastically and conveniently condensed to 


a, r=l1,2,3,...m and s=l, 2, 3,...%. 
Triple arrays can be handled similarly, with three subscripts, and 


so on. 
The use of subscripts makes it possible to denote sums of items in a 


very compact form, by means of the δ᾽ notation. Here the Greek 
capital Σ᾽ stands for ‘sum’ the items which are indicated by the symbol 
or symbols following δ᾽. So, as a matter of notation and in the interests 


of brevity, write: 


4 

Σ᾽ Ap =A, τ ας tAg tO 
f= 

n 


> a, =A, +a, + Ag+ eee + Qn. 
r=] 
The flexibility of the notation is seen in a few illustrations: 


a,0, ΞΞ a,b, + Abs +- asd, 


n 
StF τεαγαῖ Ἔα, + Ag03 +... αι, ας 


7] PRELIMINARIES 21 


n 
- ποὺ τὰ 
ψιΞ- δ᾽ αι γῶς το Aq Hy + χῦχας + Ay g%j%q + ... + AyyXyTp. 
s=1 


Notice that the subscript r used in a sum like y, a,b, 18 ‘knocked out’ 
r=] 


by the Σ᾽; the expression obtained does not depend on r. In fact, it 
can be changed without altering anything; it is just a matter of a 
convenient label: 
a,b, = a,b, = : ab,=... 
y Y αἰδιτι Σ ade 


1 
are all the same thing (a,b, + αοὖς Ἔ αφὖς Ὁ ... +4,5,). Such a subscript 
is called a dummy subscript. On the other hand, there may be a free 
subscript which appears in every item and which is not subjected to a 
n 
summation process. In the last illustration above, y,=)  44.%%,, 
s=1 
the subscript 1 is a free one. If it is changed, say to 2, then a different 


n 


expression 7, Ξε Σ᾽ Gash gt, results. This suggests a further develop- 
s=1 


ment. Take a sequence of free subscripts and write: 


n : : 
= ΞΕ 2 
Ψ15Ξ Σ Oy gL, ΞΞ γι + Ay QVyLq + γα +... τῇ ἀγραγχχῃ 
s=1 
wr 


—_ <a 2 
Yo= δ᾽ Aasle%s=AgyL Le + Agele + Agglghg t+... + ρα κῃ 
1 
Yn = > Dn gh % g = Ay LyX + AmeV Xn + Amglglm +... + Amn%m%n- 
s=1 


Then the general expression is: 
n 


Yr = Ν᾽ Ash ,Xy r=1, 2, 8,.... ἢν 


8--1 
where r is the free subscript, as opposed to the dummy 8. Finally, r_ 
can be ‘knocked out’, or made a dummy, by summing a second time: 

m ™m n 

> Y,= Y Σ᾽ Ay gb pHs 

r=1 r=1 8=1 
which, on writing out in full, is a double array of terms, all added. 

Caution is needed; the > notation is very concise and convenient 

but, until experience is gained in its use, it is easy to make errors. 
The summation process of > can deal easily with sums and constant 
factors: 


22 PRELIMINARIES [1 


Σ (a,+8,) = Σ a+ δ 5, ; δ ka, =k y a, (k constant). 
r=] 1 


For example, the following are perfectly in order: 


n n n ” n 
(a, +6,)? = (a*.+2a,b,+62) = ar +2 a,b, + b?2 
n 


n 
and Σ Ay αὐ δ, = Ly Σ Ων, οὖς. 
8=] 


= 


But care must be taken not to try to do this kind of thing for sums of 
n 

products. Σ᾽ a,b, = a,b, + 2b, +a9b,+...+0,5, is not reducible in 
r=] 

any way. In particular: 


Σ abe (Σ a)( b, . Sate (Σ 2) 


r=1 r=1 


and Σ a, +b.) (5 a, + 54). 


The operations of mathematics are denoted by symbols, the 
operators which say ‘carry out the operation’ concerned. It is usually 
the convention that, if several operations are made, the operators 
are to be read from right to left. The use of brackets, which are 
eliminated ‘from the inside out’ in simplification of an expression, 
helps considerably in keeping the order of operations. But brackets 
are often omitted, for brevity, and care is needed in (mentally) 
putting them back. For example, the operations of multiplication x , 
of squaring (...)? and of square root extraction ,/(...) are combined in: 


Jay? = /{a x (y)?} (reducing to y/x when y>0). 
This means: square y, multiply the result by x and take the root of 


the result. The order here cannot be changed; it is read from right to 
left as shown. It is quite a different thing to write: 


x,/(y2) (reducing to xy when y>0). 


This becomes very important as more operators are added to the 
mathematician’s armoury. In the calculus, the operator D means 
‘take the derivative of’. Consider a combination of D with the 
operator ,/ for ‘take the square root of’: 


DV1 +27 =D{/(1+22)} 
which means take the square root of 1+? and then take the deriva- 


7) PRELIMINARIES 23 


tive of the result. Calculus provides the answer: J. Ths 
J/(1 +2?) 
expression is not the same as: 
JD(1+2%)=/{D(1+2*)} 
which means take the derivative of 1 +2? and then take the square 


root of the result. The answer now is: J2z. 

In conclusion, we can take note here of two special notations which 
are of frequent use in a variety of connections. The first provides a 
way of writing the product of all positive integers from 1 to n: 

Noration: The product of the integers from 1 to n is called n factorial: 


nt=n(n—-1)(n—-2)...2.1. 
There are some obvious properties: 


ni=n(n-1)! or ie ΒΡ Ε 


(η -- 1)! 
More generally, if r is a positive integer less than n: 
n! 
aT =n(n—-—1)...(n-r+1). 


Out of this there develops a further conventional notation: 


nt=n(n-1)...(n—r+1)(n—-r)! or 


NoraTion: Teaco nm! _ m(n—1)... w-r +1) 


r nm—r)! ὁ r! 


where n and r are integers (r <n). 


In addition, it is sometimes convenient to write ( a " ΞΞ 1, The 
n 


expression [μὴ is known in elementary algebra as the “binomial 
coefficient’ or as the ‘number of combinations of n things r at a time’, 
alternatively written "Ὁ 


The other notation is for the magnitude of a real number, sign 
ignored :* | 
Notation: The absolute value or modulus of a real number a 18: 
| a | =positive number of pair (a, -a)=,/a’. 
It is immaterial whether a=0 is considered as a possibility here, 


* The notation extends to the absolute value or modulus of a complex number 
(2.5 below). 


24 PRELIMINARIES | | (1 


since | a | =0 can then be written if we wish. The following properties 
are derived: 
|a|=a (a>0) and |a|=-a (a<0) 
| ab |=|a| x5 | 

|a+b |<|a]+| ob]. 
Only the last causes any trouble; it is to be established by considering 
in turn all four cases obtained by taking a positive and negative and 
6 positive and negative. 


1.8. References. There are several short introductions designed to 
tell a general reader what mathematics is about and how its tech- 
niques are applied. 

W. W. Sawyer: Mathematician’s Delight (Pelican Books, London, 
1943) 

A. N. Whitehead: An Introduction to Mathematics (Home University 
Library, London, 1911) 

are both successful in their rather different ways. As a first approach 

to the basic concepts of ‘modern’ mathematics, and therefore as a 

good preliminary reading before embarking on the present text, the 

following can be recommended: 

Irving Adler: The New Mathematics (John Day, N.Y., 1958) 

W. W. Sawyer: Prelude to Mathematics (Pelican Books, London, 
1955); and A Concrete Approach to Abstract Algebra (W. H. 
Freeman, San Francisco, 1959). 

The traditional courses on elementary mathematics in schools are 

reviewed at some length and re-interpreted in terms of basic ideas in: 

W. L. Schaaf: Basic Concepts of Elementary Mathematics (John 
| Wiley, N.Y., 1960) 

The present volume is intended to take the reader farther, and 
more deeply, into basic mathematics than any of these introductory 
books do. It leaves plenty of scope for parallel reading and for 
exercises in various fields. The following represents a selection of the 
more specialist, but not very advanced, texts which can be used for 
the purpose. In algebra and finite mathematics: 

J. G. Kemeny, J. L. Snell and G. L. Thompson: Finite Mathematics 
(Prentice-Hall, Englewood Cliffs, N. J., 1957) 

D. C. Murdoch: Linear Algebra for Undergraduates (John Wiley, 
N.Y., 1957) 


8, 9] PRELIMINARIES 25 


G. Birkhoff and 5. MacLane: A Survey of Modern Algebra (Macmillan, 
N.Y., Revised Edition, 1953) 

These texts have nothing to say about calculus, or about mathe- 

matical analysis generally. For this, good references are: 

S. I. Altwerger: Modern Mathematics (Macmillan, N.Y., 1958) 

R. Courant and H. Robbins: What is Mathematics? (Oxford University 
Press, 1941) 

together with the ever-green and indispensible: 

G. H. Hardy: Pure Mathematics (Cambridge University Press, 1st 
Edition, 1908; 10th Edition, 1952). 

On linear systems, there is a stimulating if rather more advanced text: 

R. A. Frazer, W. J. Duncan and A. R. Collar: Elementary Matrices 
and some Applications to Dynamics and Differential Equations 
(Cambridge University Press, 1947). 

As the development proceeds, in the following chapters, brief 
references are given to the great mathematicians of the past and to 
the years in which they lived. These are intended as sign-posts for 
those interested in the historical evolution of the subject, an interest 
much to be encouraged. Parallel reading can be profitably undertaken 
in the history of mathematics, for example from: 

E. T. Bell: The Development of Mathematics (McGraw-Hill, N.Y., 
2nd Edition, 1945). 


1.9. Exercises 


1. The temperature of water is taken (at sea level) and the statement made: 
y° F is the same temperature as x° C’. Express y in terms of x and specify 
the domain of x. (0° C corresponds to 32° F and 100° C to 212° F.) 

2. Extract the four aces from a pack of playing cards and select x aces from 
the set of four. Let y be the number of different selections of 7 aces which can 
be made. Show that x has the domain {1, 2, 3, 4} and y the range {1, 4, 6} 

3. Draw a graph to show the expressions y τειν ~ 2 (x rational) and y =4/z 
(x positive rational). Hence find one root of 2* - 27 -4=0. 

4, Plot y=./(4 —2?) as a graph for real x, non-negative and not greater than 
2 (0 <x <2). Find the solution sets for /(4 ~2?)=1 and for (4 - "ὃ <I. 

5. Show that y=|z| and y=,/x? are identical expressions; represent 
graphically for real 5, ~2<2 «2. 

6. nth Roots. If a is a positive real number, the notation να stands for the 
positive sane root of a. Why is no such qualification needed for ἡ αἵ What 


of 4/a, </a, ... 
7. span the square in y =ax* + ba +¢ and show that, if a and ὁ have the 


26 PRELIMINARIES [1 


same sign and if b*<4ac, then y>0 for all real x (a and c positive) or y <0 for 
all real x (a and c negative). 

8. Since the solution of an equation is unchanged by the removal of a 
multiplicative factor, show that polynomial equations with integral and with 
rational coefficients are interchangeable concepts. Illustrate by showing that 
8x* -- 62 +1=0 and x* -$%+4=0 are the same, with solution set{1/4, 1/2}. 

9. Show that the quadratic (47 -- 1)z? -- θα; -- ὶ =0 with real coefficients has 
two real roots x = {3+ ./(32m + 1)}{(4π -- 1). Show that it arises in finding the 
radius (x feet) of a sphere with surface area equal to the area of a rectangle 
(5 +2) feet by (ὦ +4) feet. Establish that there is such a sphere, with radius a 
little over 1 foot. 

10. Solution of Cubic Equations. Draw the graph of y -- 9.8 - 3x? +2, insert 
y=1, y=, y =2 and y =3, and indicate why 2.8 -- 3x? -- 1 =0 has only one and 
2." — 8.93 + 4 =0 has three real roots. What of 223 -- 3x?+1=0 and 223 -- 3.3 =0? 

11. Show that x4 ~a? -22+2=(x%?-—1)2+(a”—-1)? is positive for all real x 
except that it has a zero when x=1. Deduce that x‘ — x? -- 22 +2 =0 has only 
two real roots, i.e. a double root x=1. 

12. Ife+y-—4=0 and «-~y+2=0, show that x=1 and y=3. Ifa+y-—4 
=0 and x -y+2>0, show that y=4 — for x>1. Illustrate graphically. 

13. If k is a positive parameter, express the solution of ΞΞὶ -- and y=x+k 
in terms of k. Restrict to x>0, y= 0, and show that there is no solution unless — 
k<1. Interpret y=1-z2 as the demand for a commodity at price x and 
y=x +k as the supply, shifting with k. Illustrate graphically. 


n 
14. Show that Σ a,x; >0 for all real a, 22, ... ὧς if and only if a,>0 (all r). 


15. From > a,A, =A, Σ δ» deduce Σ Σ a,a,=> Σ(ὡΣ a a .)= (2 Σ αἱ) 


s=1lr=1 8=1 
The last expression can be written (> a,)?; why? If ἢ =2, show that all these 
r=] 
double sums are simply: aj + 2a,a, +a}. 


n n 2 n n n 
*16. Show that ( 2 α, +> Δ) =(, 2 aa +2 BR +> 43) +A 
T= T=1 = = f=] 


where A= >>4,a, +2>d1a,6, + >Db,b, (DD = Σ᾽ Σ᾽ excluding 7 -Ξ8). 


r=1 s=] 
n 


Deduce that the difference (= 2 or + Σ be) - Σ (a,+6,)? is A. 
r= r= 


17. Illustrate the use of brackets by writing a®° either as (a)°° or as (a®)* =a, 
(The first is the usual interpretation.) As an instance, show that 25" =(2)?* =512 
and that 2°” = (23)? = 64. 


2 
18. Binomial Theorem. Show that (1l+z)*=]1+2¢%+a2= > ie) x” and 
r=0 \T 


3 n 
that (1+2)§=14+3x74+322+22= > (3) x". Generalise to: (1+a)"= > @ i ih 
r 


r=0 T r=0 


CHAPTER 2 


NUMBER SYSTEMS. 


2.1. Rational numbers. A strictly logical treatment of mathematics 
starts with the general concept of sets on which all mathematics, and 
indeed formal logic, are based. It then proceeds, in an inevitably 
leisurely way, to groups and rings, to relations, mappings and trans- 
formations, all on an abstract level. It is some time before anything 
at all recognisable is reached; an attempt to drag in familiar con- 
structions as illustrations could be made but it would be artificial 
and unconvincing. If interest is to be aroused and maintained, a 
compromise has to be reached between the desire for logical develop- 
ment from (unfamiliar) basic ideas on the one hand, and the need to 
keep in touch with (familiar) practical mathematics on the other 
hand. The present chapter and the next one represent such a com- 
promise. 

Much of mathematics, though by no means all, makes use of num- 
bers. It is concerned with relations between variables taking numerical 
values. In application, it refers to ‘objects’ and ‘entities’ which are 
ordered and/or measured. Much of mathematics, but again not all, 
boils down in the end to solving equations, to finding the particular 
variables which satisfy given conditions: when will a ball thrown 
into the air start to come down, how much wheat will be sold at such 
a price, what is the particular path followed by the ball or by the 
price of wheat over time? With this in mind, we start here with the 
familiar number system, the numbers described as ‘rationals’. At the 
same time, we bring in the equally familiar polynomial equations, 
specifically the linear, quadratic and cubic equations. 

The treatment in these two chapters involves a good deal of 
jobbing backwards and forwards. Inevitably it will require tidying 
up later on when a more steady progress is made from the basic ideas 
of sets. There are, however, some very clear advantages. We are able 
to see more clearly, not only the correct ways of doing things, but 

B2 A.B.M. 


28 NUMBER SYSTEMS [2 


also the reasons why they have to be done. Jobbing backwards 
shows up the need to take nothing for granted, to query everything, 
to delve deeper until the barest minimum of the undefined is exposed. 
Jobbing forwards avoids the awkward and unnecessary ‘compart- 
ment’ method of treatment. There is no need to exhaust one subject 
before tackling another; indeed there is much to be lost since mathe- 
matical techniques are highly inter-related. Further, when we do 
start on sets, groups and other basic concepts, we have plenty of 
material ready to hand for illustration. 

Rational numbers comprise all the simple numbers dealt with in 
elementary arithmetic: positive and negative integers and fractions 
(ratios of integers), together with zero as the number separating the 
positive from the negative. The whole system of rationals can be 
looked upon either as a collection or set, or as an ordered sequence. 
(The concepts of sets and sequences here are the obvious ones; more 
specific developments of them come later.) The numbers are subject 
to the operations of addition and subtraction, of multiplication and 
division,* which we take for the moment as self-evident. As a set, the 
remarkable feature of the rationals is that they are closed with respect 
to the operations. Add or subtract any two rationals, multiply or 
divide them, and we still get a rational. The combination of rationals 
by means of +, —, x and + always gives another rational and never 
anything outside the set. As a sequence, the rationals are indefinitely 
extended, stretching in both directions through positive and negative 
numbers from 0. Further, between any two rationals (e.g. 0 and 1), 
there are as many others as we like to specify. They are uniquely 
ordered by greater and less. We can arrange any subset of them in 
ascending order, e.g.: 


-ἃ, —l, — 4, ~#, 0, 3; z 3; 1, re 
It is easy to determine which of two rationals is the larger; for 
example 14/27>71/139 since 14x139=1946 is greater than 
27x 71=1917. 

There are some difficulties. As in nearly all number systems, the 
number zero needs careful handling, particularly as regards the 
operation of division. The way out here is explicitly to exclude the 
number zero from the multiplication/division process, keeping it for 


* Provided that we do not divide by zero. 


1, 2] NUMBER SYSTEMS 29 


addition/subtraction. Another point arises in attempting to design a 
suitable notation. Rationals may be represented by p/q, where p and 
4 are positive or negative integers, together with p=0. Whether this 
is satisfactory or not depends on the view taken about the ordering 
of rationals. If we are happy to have partial ordering according to 
greater, less and equals, with many rationals left on an equal footing, 
then the notation p/q serves. If we want, as is implicitly assumed 
above, a strict ordering according to greater and less, with no two 
rationals ranked equal, then it won’t do. The notation shows 

—2 2 -4 4 -6 

3 38°" Ὁ 9 τ 
as all different. They are equal from the ordering point of view; each 
can be represented by the single rational —%. There is the need to 
eliminate duplication and perhaps the simplest notation for what 
we have in mind is: +p/q, where p and gq are relatively primet 
positive integers together (as before) with p=0. 

This discussion illustrates that, though arithmetic can proceed 
with fractions, it tends to be messy. An alternative is to work with 
the decimal notation, generally simpler but with a different kind 
of complication— the need to handle recurring decimals (2.9 
Ex. 2). 


2.2. The operational rules and order properties. The set of rational 
numbers is denoted by ἢ. There are two operations, addition (+) 
and multiplication (x), each applied to two members of καὶ to give 
another member of ἢ. The processes of subtraction and division are 
subsidiary (see 2.9 Ex. 3) to be derived from addition and multiplica- 
tion respectively. The rules of addition and multiplication are so 
well-known that they are applied, even in the most elementary 
arithmetic, without thought. However, they had to be learnt, along 
with the addition and multiplication tables (which define the opera- 
tions), at some time in everyone’s early life. It is a very salutary 
exercise, and a very useful one in the subsequent development, to 
specify them precisely and to label them carefully for recognition 
later on. They are: 


t ‘Relatively prime’ means no common factor other than 1, see 3.1, 


90 NUMBER SYSTEMS [2 


The Operational Rules of Algebra 
For the set R = {a, ὃ, c, ...} of rational numbers 


Rule Addition (+) Multiplication ( x) 
1. Closure a+b6 belongs to R a xb belongs to καὶ 
2. Associative a+(b+c)=(a+b)+c a x(bxc)=(axb) xe 
3. Commutative | a+b=b+a axb=bxa 
4, Identity Zero 0 such that Unity 1 such that 
a+0=O0+a=a axl=1xa=a 
5. Inverse Negative (-—a) such Reciprocal a-! such 
that that 
a+(-—a)=(-a)+a=0 | axat=a!xa=l1 
(a=40) 
5A. Cancellation Ifa+b=a+e | Ifaxb=axc (a0) 


then b=c | then ὃ =c ot 


6. Distributive ax(b+c)=axb+axc and 
(a+b) xc=axct+bxe 


Here a, ὃ and ¢ are any rational numbers, belonging to R. Rule 1 


- (closure) expresses the fact that R is self-contained, both for + and | 


for x. Rule 2 (associative) indicates that the order of repeated 
addition or multiplication is immaterial. Rule 3 (commutative) 
indicates that, when two rationals are combined (by + or by x), it is 
a symmetrical operation: a with ὃ is the same as b with a. This 
reflects the fact that the operations are defined in a symmetric way. 
Rule 4 (identity) shows that R contains a unique member which 
doesn’t alter another member by addition, and another unique 
member for multiplication. Rule 6 (distributive) links addition and 
multiplication. 
_ Rule 5 (inverse) is the one which allows subtraction (or division) 
to be introduced, as undoing what is done by addition (or multiplica- 
tion). Every rational has its negative; the two add to zero. Now 
define: 

a—b=a+(-)) 
where ( — δ) is the negative of ὃ and the algebraic process known as 
subtraction is accommodated. Similarly, define division in terms of 
reciprocals: 

ajb=a x b-} 

where b-! is the reciprocal of ὃ. The rule labelled 5.4 (cancellation) is 
added in the table for particular reasons. First, it is a very important 


2] NUMBER SYSTEMS 31 


consequence of 5 (together with the previous rules*). For multiplica- 
tion: 


Given: a has reciprocal απ’ (a#0). 
From: axb=axc (ας) 

we get: a1 x (axb)=a-! x (a xc) 

1.6. by 2: (a-? x a) xb=(a-! x a) x€ 

1.6. by 5: Ι χ-ι χο 

1.06. by 4: b=c_ which is 5A. 


Notice that it is in the writing of a reciprocal (rule 5), and equally in 
cancelling a common element (rule 5A), that the exception a#0 must 
be made. Another version of the cancellation rule is obtained from 
5A by putting c=0 


5B. If ax b=0, then either a=0 or b=0. 


Second, rule 5 is one which, for other systems, may very well not 
hold. It is then important to know whether 5A holds or not. Though 
5A follows from 5, the converse is not so: if 5A holds, 5 need not. 
Hence 5A is a weaker version of 5. We may still be able to cancel, 
even if we cannot write inverses. On the other hand, neither may 
hold in which case we have the (apparently curious) situation, the 
negative of 5B: 


There are non-zero a and ὃ such that a x b=0. 


This is described by saying that there are ‘divisors of zero’: 0 divided 
by ὁ is a, and 0 divided by a is ὁ. 

Fortunately, the set A of rationals is well- eonaeede it has recipro- 
cals, cancellation can be done, and there are no divisors of zero. 
Such a set of numbers, for which the whole list of operational rules 
holds, is called a field. The field of rational numbers is the first of 
several fields we shall meet. 

Rational numbers have the equally important property of being 
ordered; the set A is not only a field, it is an ordered field. The 
properties of order are well-known in elementary arithmetic but it is 
again a useful exercise to spell them out and to label them. The order 
symbol ‘<’ is used for ‘less than’: 


* Only the commutative rule 3 is not used, an important point later. 


92 NUMBER SYSTEMS [2 


The Properties of Order 
For the set R ={a, b,c, ...} of rational numbers 


_ Name Property 
(i) Trichotomy One and only one of a<b, a=b, b<a holds 
(ii) Transitivity If a<b and b<ce, then a<c 
(iii) Density If a<b, then c exists so that a<c<b 
(iv) Extension For any a, 6 exists so that a<b 


and c exists so that c<a 
(v) Consistency If a<b, then a+c<b-+e for any 6 
and a xc<b xc for any positive c 


There is an alternative symbol: ‘>’ for ‘greater than’. It is in common 
use but it adds nothing to the set of properties. Switch a and b around 
and *<’ is replaced by ‘>’: ifa <b then b>a. The transitive property, 
for example, then reads: if a>b and b>c, then a>c. 

The development of real numbers, in 2.4 below, is based on the 
concept of the rationals as an ordered sequence. The properties of 
order are further examined, in more general contexts, in connection 
with ordered fields (6.7 below) and in specifying order as a relation 
(7.7. below). 


2.3. A wider field of numbers. Rational numbers serve for most 
purposes of arithmetic but are soon found to be inadequate in 
algebra, e.g. in solving quadratic equations. Consider the ‘number’ 
/2, the extraction of the square root of 2 in the solution of the simple 
quadratic x? -- 2=0. This ‘number’ arises in many ways; for example, 
by Pythagoras’ Theorem, it is the length of the diagonal of a square 
of unit side. It can be shown, quite formally, that /2 is not a rational 
number, as in the following reductio ad absurdum proof. Suppose /2 
is rational and write it p/q for certain integers p and qg. Then 
(p/q)? =2, or p? = 2q?. If q is odd, g? has no factor 2; if ῳ is even, q* has 
an even number of factors 2. Hence, p? = 243 has 2 as a factor an odd 
number (1, 3, 5, ...) of times. This is impossible, whether p is odd or 
even. Hence ,/2 is not rational. 

Many other polynomial equations fail to have rational solutions 
in the same way as z?-2=0. This is an unsatisfactory situation, 
needing correction. The way out, in elementary arithmetic, is to 
carry out certain computations and to write, for example, ,/2 = 1-414. 


9] NUMBER SYSTEMS 33 


But this is merely a rational approximation 1414/1000 =707/500 to 
J/2; itis not ./2 itself. What is needed, clearly, is an extension of the 
system of rational numbers, so that equations like x?- 2=0 can be 
solved, and so that the operational rules and properties of 2.2 are 
preserved. The full extension is given in 2.4; meanwhile here is a 
method which suggests itself. | | 

The root of x? -2=0 is not a rational in the field R. Define it as a 
number of a new kind and write it ./2. (Strictly, there are two roots 
+ ,/2, but it is enough to take the positive ./2 and allow the negative 
— ,/2 to arise automatically.) To all the rationals of R, throw in the 
new number ,/2 and form all the necessary sums and products to 
ensure that the wider set of numbers is still a field, satisfying the 
rules of 2.2. This process, if it can be carried through, is called the 
adjunction of the new element ,/2 to the field R. It produces a new 
and wider field, denoted R(,/2). It is a very general and useful pro- 
cedure, and other examples are met later. 

It remains to show that the object can be achieved in this case. 
The new number ,/2 must combine with the old ones (rationals) ; 
we must consider a +6,/2 as a new number*, multiplying ./2 by the 
rational ὃ and then adding the rational a. But this is not all. We still 
have to combine the new numbers amongst themselves, to ensure 
closure in the wider set. This requires a re-definition of addition and 
multiplication, for numbers of the form a + b,/2. The definition is not 
arbitrary, since it is designed with a sharp eye on what we want to 
- achieve: 


(a +b,/2) (¢ +d,/2) =ac + (ad + bc), /2+bd(/2)?> ...... 


(a+b/2)+(c+d/2)=(at+c)+(b4+d)/2 ~ 
(1) 
= (ac + 2bd) + (ad + bc), /2 


since by definition (./2)?=2. We have now achieved closure, for the 
sums and products defined by (1) are themselves of the same form: 


(rational number) + (rational number) x /2. 


Finally, we check all the other rules of 2.2, one by one, to see that 
they still hold for R(./2) made up of numbers of form a+b,/2. It is 
found that they do (2.9 Ex. 6). Take rules 4 and 5 for multiplication 


* Here and later, if no ambiguity arises, we drop the x sign for multiplication. 
Hence, ab means a x 6, a + 5,/2 means a +b x ,/2, and so on. 


94 NUMBER SYSTEMS [2 
as an illustration. The unity of R(/2) is still 1 (ie. 1+0 /2) since 
putting c=1, d=0 in (1) gives: 
(a+b,/2)l=a+b/2. 
To get the reciprocal of (a +6,/2), notice that another application of 
(1) gives: 
(a +b,/2) (a —b,/2) =a? -- b*(,/2)? =a? — 26? 

a—b/2 


qi ον | 


or: (a+6,/2) 


—b,/2 [ - 
Hence α +b,/2 has a reciprocal — = (=a) + (en) WZ; 


which is of the same form. 

A new field of numbers R(/2), comprising all numbers of the form 
a+ ,/2 (a and ὃ rationals), is thus created. It contains the old field R 
of rationals (the special cases with b=0), it contains many new 
numbers (those involving ./2), it satisfies all the operational rules and 
it can be operated upon algebraically in the familiar way. Some 
quadratic equations can now be solved, with roots in R(/2). One of 
them is x? -- 2=0, with roots x= +,/2. Another is 

2u? —-4%4+1=0. 
The process of completing the square gives: 
2 -- 44 +4+1=2 (a? -- 2. -- 1) -- -᾿ Ξ- 3 (ἡ -- 1)3-- Ἰ 
and the equation becomes: 
2(e-1)?=1 or (#-1)?=2/4 or x-1=+4,/2. 
The roots are x =1 + 4,/2, of the form a + b,/2 of the field R(,/2). 

Unfortunately, the wider field R(./2) does not take us very far, 
even with quadratic equations. There are many still without solu- 
tions; x? -- 3=0 is a simple case. We could attempt to pursue the line 
of development: the adjunction of ./2 to the field R gives a wider 
field of numbers R(,/2); the adjunction of /3 to R(,/2) gives a still 
wider field of numbers R(,/2, ./3); and so on. This turns out to be an 
impossibly protracted line. Having noted that a wider field can be 
got by adding a new number to an existing field, we look for some 
more powerful and corner-cutting method of extending the number 
system. The extension adopted (2.4) is a difficult one to achieve. The 
subsequent further extension (2.5) then gets by with the adjunction 
of a single new number on the lines here indicated. 


4] NUMBER SYSTEMS 35 


2.4. Real numbers. View the set 1 of rational numbers as an ordered 
sequence with the order properties of 2.2. R is indefinitely ‘dense’ 
in the sense of property (iii): select two rationals a and ὃ, no matter 
how close, and there is always room for a third rational c to fall 
between then, for example c=}(a+b). Further, κ᾿ is indefinitely 
‘extended’ in the sense of property (iv): for any rational a, no matter 
how large, there is always a larger b >a (and similarly the other way). 

It is useful as an illustration to represent rational numbers as 
points on a line drawn horizontally as in Fig. 2.4a. This is a directed 
line, increasing rationals being shown 


by points arranged in order from left A, A,A; 
to right. Since R is indefinitely extended OA B 
both ways, the line must run indefin- Fig. 2.4a 


itely to the left and to the right from O, 

the point selected to represent zero. The property of indefinite density 
corresponds to the fact that a point can be inserted on the line between 
any two given points. To illustrate, take mid-points and insert A, 
between A and B, then A, between A, and B, then A, between A, 
and B,.... The rationals of R are shown in order as points on the 
line, yet indefinitely extended both ways and indefinitely dense. 

We need an extension to allow for ‘numbers’ like ./2 which are not 
rational. An arithmetic approach would be: represent a rational p/g as 
a decimal. If g has only 2’s and 5’s as factors (apart from those also 
factors of p), the decimal terminates, e.g. 71/50 =1-42. Otherwise the 
decimal recurs, e.g. 22/7 =3- 142857. It is very tempting to say that 
a decimal which neither terminates nor recurs gives a new num- 
ber, an ‘irrational’, not represented in the form p/q. The process 
of root extraction gives /2=1-4142... without terminating or 
recurring, so that ,/2 is an ‘irrational’. The whole set of rationals and 
‘irrationals’ together then comprise the set of ‘real numbers’, wider 
than the set of rationals. 

The difficulty is that we may never know for certain whether a 
decimal stops or recurs. The definition suggested is too loose to serve 
as a sound basis for real numbers; and we do need a very firm 
foundation, even for purposes of practical mathematics. It is not 
only a matter of tidying up the solution of quadratics and other 
equations. The decisive consideration is that real numbers are. 
essential for the notion of a limit and so for breaking through from 


36 NUMBER SYSTEMS [2 


algebra to the calculus. Without a firm basis for real numbers, we 
cannot hope to establish the concept of a limit and, hence, to bring 
in the whole of the powerful apparatus of the calculus. 

The strict definition of a real number is one of the more difficult 
exercises in basic mathematics. Consider three variants of what is 
essentially the same process. The first uses a sequence of rational 
numbers and the idea that a sequence may have a limit. It brings in, 
therefore, the idea of a limit, so important later on, and does so 
explicitly. A particular form of sequence suggests itself: the familiar 
process of successive approximation to a number by taking more 
decimal places. Two cases illustrate: 

(i) The rational 22/7 can be approximated from below by the 
increasing sequence: 

3-1; 3-12; 3-14; 3-141; 3-142; ... 
or from above by the decreasing sequence: 
3-2; 3-17; 3-15; 3-145; 3-143: ... 
Represent these rationals, in pairs, on 
| successive lines, as in Fig. 2.46. The first 
naa : 1 pair is 8:1 and 3-2; the ae pair is 3-12 


—+——— ++ and 3-17; and so on. Either sequence, 
continued indefinitely, tends to 22/7 in 


| | the limit. So does any ‘crossed’ sequence 
34 2 32 such as: | 
91; 3-17; 3-14; 3-145; 3-142; ... 
which is neither increasing nor decreasing. 
(ii) The irrational //2, i.e. the positive number a such that a? =2, 
can be approximated by an increasing sequence from below: 


Fig. 2.46 


a 14 1.40 1-41 1-412 1-414 ae 
a? 1:96 196 1-9881 1-993744 1-999396 


or by a decreasing sequence from above: 


a 15 81-45 1-42 1-417 1-415 ee 
a? | 2-25 2-1025 2-0164 2-007889 2-002225 


or by a ‘crossed’ sequence such as: 
1:5; 1-40; 1-42; 1-412; 1-415: ... 


4] NUMBER SYSTEMS 37 


All sequences tend to the same limit, which represents /2. A diagram 
similar to Fig. 2.4b can be drawn in this case. 

The suggestion here is that, if a sequence of rationals has a limit, 
then the limit may also be a rational; but it can equally well be an 
irrational. There is a first possible definition of a real number: 

A real number « is the limit of a sequence of rational numbers a, 

as the integer m increases indefinitely. 
The system of real numbers, on such a definition, is wider than but 
includes all rationals. The additional cases are irrationals such as ,/2. 

The second variant uses a sequence of nested intervals so arranged 
that they shrink to a final residue which is a single point. This is a 
slight but definite improvement on the first variant, as we see when 
we come to define a limit. Write the interval [a, Ὁ] to denote all 
rationals x such that a<a<b. Take a sequence of nested intervals, 
each contained within the previous one. In case (i), such a sequence 
is: 


(3-1, 3-2]; (3-12, 3-17]; (3-14, 3°15]; [3-141, 3-145]; (8-142, 3-143); ... 
as represented in Fig. 2.4c. In case (ii) the sequence: 

[1-4, 1-5]; [1-40, 1-45]; [1-4], 1-42]; [1-412, 1-417]; [1-414, 1-415]; ... 
would serve; it can be represented on a 


diagram similar to Fig. 2.4c. In each case, _4 _, 
the sequence of nested intervals shrinks —}—+-—_—_____} 
to a single point, which may bearational —t———*+*—_¢-- 


(e.g. 22/7) or an irrational (e.g. /2). This | 
suggests a second possible definition of a | | | 


real number: | a ? 2 
A real number « is the final residue of a Fia. 9.46 


sequence of nested intervals [a,, b,] 
which shrink to a single point as n increases indefinitely; « is 
contained in all intervals. 
Again the system of real numbers, so defined, extends and includes 
the rationals. 

The third variant proceeds by specifying a cut (or section) of the 
set F of rationals into two parts, Z and G, such that L is less than 
(to the left of) G. The cut can be made in any way provided that: 

(1) each rational is in Z or in G 


98 NUMBER SYSTEMS [2 


(2) each of LZ and G contains at least one rational 
and 

(3) each rational of L is less than each rational of G. 
An obvious way to specify a cut is by an inequality. In case (i), take 
Las all rationals x such that 72 <22 and G as all rationals x such that 
jx>22. There is a dividing point between L and G, the point « at the 
cut, and here «=22/7. It happens that « belongs to @ in this speci- 
fication. This is accidental; a slightly different cut is got by taking x 
in L if 7e<22 and in Οἱ if 77>22, and α -- 221 belongs to L. In case 
(ii), take all negative rationals in L together with all positive rationals 
x such that 22<2; G comprises all positive rationals x such that 
a?>2. There is again a dividing point «, but it is not a rational, 
neither in Z nor in G. It is an irrational number, in fact /2, which 
plugs a gap left in the rationals between L and G. This suggests a 
third possible definition of a real number, the one we adopt: 


DEFINITION: A real number « is the point dividing the parts L and 
G of a cut of the set R of rationals satisfying (1), (2) and (3) above. 
The system of real numbers so defined again includes the rationals. 

The definition adopted has advantages over the others suggested. 
The cut specified includes and extends the idea of a sequence. As 
illustrated in Fig. 2.4d, an increasing sequence of rational 7... digs dag: 

Ly, ... approximating to « from 


L, L, L3L4 4 G, G; G G, below is simply a sequence 
Se ες selected from L; similarly for a 
L a G 

eee decreasing sequence in G. Hence, 

L or G serves to fill out parti- 

cular sequences, to comprehend them all. The reason for adopting 

the definition, however, is that it is best designed to establish the 
properties of real numbers: 

(a) Addition and multiplication can be re-defined for real numbers 
to satisfy the same operational rules as for rationals, i.e. real 
numbers form a field. 

(6) Order by < (less than) can be re-defined for real numbers to 
have the same order properties as for rationals, i.e. real num- 
bers form an ordered field. 

(c) If a cut is made in the set of real numbers, the dividing point 
is always a real number and nothing new is produced, i.e. the 
real numbers form a complete ordered field. 


4] | NUMBER SYSTEMS 39 


The formal definition of a real number, and an indication of how the 
properties (a), (6) and (c) are established, are given in 15.1. 

We assume here that this is all achieved. We have then the com- 
plete ordered field of real numbers, denoted R*. The approach we 
have adopted makes the properties appear sensible. Properties (a) 
and (b) imply that real numbers extend the system of rational 
numbers (filling the gaps in them) and that, at the same time, they 
preserve the operational rules and order properties of rationals. 
Property (c) is the new idea; it is the result of a basic theorem 
derived from the definition of real numbers. It is a property much to 
be desired. The implication is the following. If a cut of the rationals 
is made, a rational may be produced, but equally a new (irrational) 
number. Then, if a cut of the real numbers is made, nothing more 
emerges, just a real number. The real numbers are complete; no 
more gaps remain to be filled. We have reached the end of a line of 
development. 

The property of completeness, distinguishing real from rational 
numbers, is the one required to achieve the passage from algebra to 
calculus. It is matched by a corresponding completeness of points 
on a directed line. We have the continuum of real numbers, of points 
on a line. The completeness property is to be put into another form, 
one of particular convenience. First, a definition: 


DEFINITION: A set S of real numbers has a lower bound a’ if a’ <x 
for each x in S, and a greatest lower bound (GLB) a if no real number 
x >a 18 a lower bound of δ. An upper bound δ' and a least upper bound 
(LUB) ὃ are defined sumilarly. 


Sets of real numbers need not have a GLB or LUB. For example, the 
set of positive real numbers has no LUB and the set of integers has 
no GLB or LUB. But, if a set S of real numbers does have GLB a 
and LUB ὃ, then a<zx<6 for all x of S. Notice, however, that a and ὃ 
themselves may or may not belong to S. If S has GLB a, then S may 
contain a (its least member), but it may not. For example, 8 consist- 
ing of x such that 2?>2 has GLB a=,/2, not contained in S. Vary 
slightly and specify S as comprising x such that x#>2; S still has 
GLB a=./2, now contained in δ΄. It is of little significance whether a 
set includes its GLB (or LUB) or not; it is a matter which must not 
be allowed to cause any confusion. 


40 NUMBER SYSTEMS [2 


The following result is a consequence of the complete ordered 
property of the set R* of all real numbers: 


THEOREM: If a set S of real numbers has a lower bound, then it has a 
GLB; if S has an wpper bound, then tt has a LUB. 


This result should now appear both obvious and useful. To see what 
we have achieved, however, notice that a similar result is not true of 
rationals. The set of rationals x such that x?>2 certainly has a lower 
bound (e.g. 0 or 1) but it has no rational as GLB; if a (rational) is any 
lower bound (a?>2) there are always rationals b>a such that b?>2, 
i.e. bis also a lower bound. It takes the definition of the real number /2 
to provide a GLB. See 2.9 Ex. 9. 

What do we mean when we say that we have now taken a decisive 
step forward? Certainly, we have provided a proper conceptual basis 
for much of practical arithmetic. A square of unit side has diagonal 
./2; the area of a circle of radius 7 is wr?. In dealing with these things, 
we cheat a little; we approximate |/2=1-414 and 7=22/7, or what- 
ever we find convenient. Moreover, in drawing the graph of the 
curve y=x?— 2, we again cheat by giving z a suitable sequence of 
rational values, by finding the corresponding rational values of y 
_ and finally by putting a smooth curve through the plotted points. 
We can now see the extent of the cheating involved. But the reason 
why the step forward is so decisive is that the way is clear for an 
advance into new territory, that of the calculus and mathematical 
analysis. It is essential to write y as a function of z, where x is a 
continuous or real variable; rational values of x will just not do. 
This is as essential in practice as in theory. Practical mathematics, 
as used by the physicist or engineer, cannot do without real numbers. 


2.5. Complex numbers. The real numbers are a completely ordered 
set, shown by the continuum of points on a line. There is nothing to 
be added to the order of the numbers or points. If we go further, we 
must give up the order by greater and less and we must go outside the 
line of one dimension. It is easily seen, however, that we do need to 
go further. Some quadratic equations do have real roots, e.g. 
2a? --α -3=0 gives x=3/2 or —1 and z?—2=0 givesz=+,/2. But 
we are still left with some quadratics on our hands, e.g. z2+2+1=0, 
with roots which may be described as ‘imaginary’ (1.6 above). In this 


δ] NUMBER SYSTEMS 4] 


unsatisfactory situation, the obvious line to take is to extend the 
number system, again in such a way that the operational rules of 2.2 
are preserved. The object is to extend the field R* of real numbers to 
a wider field in which all equations are solved. The new field will not 
be ordered ; it cannot be, since R* is complete. But the price, the loss 
of order, is worth paying. 

The neatest way of extending 1 is by use of the idea of adjunction 
(2.3). Since z? + 1>0 for all real 2, the equation x? + 1=0 has no real 
solution. Introduce a new number 7 which is a root of the equation, 
so that ἐδ -Ξ —1. Into the set R*, closed under the operations of 
addition (+) and multiplication (x), we throw the new number 1. 
We impose the same operations of + and x on combinations of ὁ 
with real numbers a, ὦ, c, d, .... We write a+1b=—a+2xb6 and we add 
this to, or multiply it by, another of its kind (c+id=c+i xd) 
according to the rules: 

(a +1ib) + (c+id) =(a +c) +7(b+d) 
(a -- ἐδ) x (ὁ + 1d) =ac +1 (ad + bc) + 17bd 
= (ac — bd) +1 (ad + bc) 
This amounts to a re-definition of sums and products to accommodate 
the new number ὃ, and it makes use of the fact that 2 is the number 
such that 2?= - 1. 

Consider the set of numbers a+71b, where a and ὃ are any real 
numbers and where ἐξ - — 1. Addition and multiplication are closed 
for this set, since (1) shows that adding (or multiplying) two such 
numbers gives another number of the same form. The unity of the 
new set is still 1=1-+70 since: 


(a+%b)xl=a+ib_ by (1) with c=1, d=0. 
Further, another application of (1) gives: (a - 16) x (a —1b) =a? + b?. 
Hence: (a +70) ἰπῆν + ΕΞ =) 


a® + 6? a® + 6? 
Φ . . e = b 
and the reciprocal of a +76 exists as - ἡ (ap): a member of 
a®*+b?  \a?+b? 


the same set. We can write: 


1 a—wb Ν᾿ α --ἶὖ 


at+ib  (a+ib)(a—ib) a+b? 


on clearing the denominator of i (see Appendix A.6). All the other 


(a+b)? = 


42 NUMBER SYSTEMS [2 


operational rules of 2.2 apply, as can easily be checked (2.9 Ex. 6). 
The set of numbers ἃ - ἐδ forms a field. 

The object is achieved. The adjunction of just one new element ὁ 

(where 7? = — 1) to the field R* of real numbers, with + and x re- 
defined as in (1), produces a new field of numbers a + <b (a and 6 real). 
We have the field C of complex numbers a+ ib. C includes R*, since 
a+b is the real number a when ὦ =0. 
_ The process of solving the quadratic equation az?+bxr+c—0 is 
now complete. The solution offered in 1.6, = {-—b+ /(b? — 4ac)}/2a, 
gives two real values when 6?>4ac (coincident when b?=4ac). It 
gives two complex values += {-—b+%,/(4ac — b®)}/2a when b?<4ac. 
For example, z?+2+1=0 has roots: x=(-—1+14,/3)/2. 

The field C has a remarkable property. The fundamental theorem 
of algebra (Chapter 3) shows that all polynomial equationst are 
solved within the field of complex numbers a+ib, including real 
numbers (ὃ =0). Once again we are at the end of the line. For ordinary 
algebra, we require more than the field R* of real numbers, but the 
field C of complex numbers is the biggest we need. The adjunction 
of Just one new element, i as a root of x2+1=0, turns the trick. 
We can then solve, not only quadratics, but all polynomial equations. 
On the other hand, though R* is ordered, C is not. We cannot place 
i, the new number, in any order amongst real numbers. Graphically, 
7 cannot be shown on the line of real numbers; if it is to appear at all, 
it requires an extra dimension. 

The definition of complex numbers on these lines is neat but 
abstract. We get no idea of how complex numbers can be applied, or 
of their graphical representation — apart from the negative result 
that they need an extra dimension. Hence we give, as our definition 
of complex numbers, a more pedestrian concept, but one which 
enables us to bring in two constructions of the utmost importance 
in themselves: vectors and number pairs. 


DEFINITION: The complex number z is the pair (x, y) of real num- 
bers subject to the rules for equality (=), sums (+) and products ( x ): 


(U1, Yi) + (Xe, Yo) = (αι + Xe, Yr + Ys) 


(2%, Yx) = (44, Yr) wmplies x, =2, and Yi " 
Ly, Y1) Χ (Xe, Yo) = (“γχς — YsYo, T1Yo + LoY;) 


+ With rational, or indeed real or complex, coefficients. 


5] NUMBER SYSTEMS 43 


The rules (2) for sums and products are by no means arbitrary. On 
the contrary, they are cunningly devised to agree with (1), i.e. so that 
the operational rules of 2.2 are obeyed. Later, we have little difficulty 
in establishing that the complex numbers z, so defined, form a field. 

A graphical interpretation is given in Fig. 2.5a, a representation 
known as an Argand Diagram, after Argand (1768-1822). The order- 
ing of the number pair z=(z, y) 18 
essential: 2 first, y second. The pair 
then serves as the co-ordinates of a 
point P in a plane, referred to axes Oxy 
and with a scale of measurement fixed 
on each. The unit circle (centre O and 
unit radius) is drawn to indicate mag- 
nitudes; it cuts the axes in A, B, A’ 
and Μ΄. The complex number z is 
shown either by the point P (x, y) or 
by the vector OP. P is located by taking OM =z and ON =y. Alter- 
natively, the location of P is by the length r of the vector OP and 
the angle θ it makes with Ox. Here θ is a number, the measure of 
the angle in convenient units (e.g. in degrees). By elementary trigono- 
metry, x=r cos θ and y=rsin @ so that r?=2?+y? and tan 0=y/x 
(see Appendix A.9). Hence: 

Notation: The absolute value or modulus of the complex number 
z=(x, y) δ r=./(z? + y?); and its argument or amplitude is the angle 0 
such that x=r cos θ and y=r sin 6. 

The absolute value is also written | z |, as indicated in 1.7 above. 

The rules (2) imposed on complex numbers have their graphical 
interpretation on an Argand Diagram (see 2.9 Ex. 11 and 13). First, 
it takes two things to fix the location of a point in a plane. Hence, 
2,=(%,, ¥1) is equal to z,=(Xp, y,) only if both 
conditions, x, =2, and y, =Y., hold; the corres- 
ponding points P, and P, coincide only if both 
co-ordinates are the same. Second, the vectors 
OP, and OP, add to the vector OP (for 
2=2,+2.), where OP is the resultant of OP, 
and OP, in the sense (familiar in mechanics) 
that OP is the diagonal of the parallelogram 
formed from OP, and OP, (Fig. 2.56). Third, Fia. 2.56 


Fia. 2.5a 


44 NUMBER SYSTEMS [2 


the vectors OP, and OP, multiply to the vector 
OP (for z=2z, x 292) where OP has absolute value 
r=1r, xr,arnd argument 6 = 0, + @,. Multiplication 
combines expansion and rotation of vectors: the 
lengths multiply and the angles add (Fig. 2.5c). 
To return to algebra, we need a more con- 
venient notation than z=(z, y). Write (x, y) as 
xpy, where p is a ‘place-marker’ designed to 
show that x comes first and y second. The 
Fia. 2.5¢ object now is to get a suitable notation for p. 
In the light of what we have said, we expect 
p to be ‘+2’. But we must get this from the rules (2) which appear in 
the present notation: 
Ly Py, + LePYe = (LX, +X2)p (Yr + Yo) Ὶ (3) 
Ly PY, X LoPYo = (%yLe -- YrYo)D (L1Yo + LoY) Ϊ ἀπ το eon 
Pick out the complex number 1p0 (vector OA) and write it: 1p0=1. 
This serves as wnity since (3) gives: 1p0 x zpy =axpy. 
Next pick out the complex number ( — 1)p0 (vector OA’) and write 
it: (—1)p0= —1. This is in order since, by (3): 


(—1)?=(-1)x(-1)=(-1)p0 x (- 1)p0=1p0=1. 


Further, pick out the complex number 0p1 (vector OB) and write it: 
Op1=1. Then by (3) ὁ is such that: 


=1 x1=0pl x Op1=(-—1)p0= - 1. 


Finally, if x is any real number, write rp0 =z. From (3): 

©,p0 +%p0 =(%,+%2)p0 =x, +2, 

x, p0 x ©, p0 = (X,2%_)p0 = 2142p. 
Hence, xp0 is associated with the variable real number 2x; these 
numbers add and multiply amongst themselves and they are given 
by points on the ‘real’ axis Oz. 

The appropriate way of writing p is now apparent. By applications 
of (3): 
xpy =xp0 + Opy=xp0+0pl1 x yp0=x+1xy 

in terms of the notations adopted. Hence zpy=xz+ 1y and p is ‘+2’. 


Noration: The complex number z=(x,y)=x+ty where + 18 
addition and 1 is such that ἐξ-- — 1. 


δ, 6] NUMBER SYSTEMS 45 


It follows immediately that the rules (2) or (3) can be forgotten, 
replaced by the ordinary operations of + and x for real numbers, 
subject only to 2?= — 1: 


(αι +01) + (ας + 1Y 2) =(%y Ἐ 29) +4(Y1 + Ye) 
(+ 0Y;) Χ (Lo + Ye) =HyLe +4 (αν, + LeYi) + VY Ys 
= (© 1X_— YrYo) +4 (T1Yo + L2Yy). 

We are back where we were with (1). All the operational rules of 2.2 
are obeyed by x + ψ. The set C of complex numbers 2 =x + iy is a field. 

It is established terminology that x is the real part and iy the 
emaginary part of the complex number xz + iy. Such a use of the word 
‘imaginary’, like the use of ‘complex’ in complex number, scarcely 
does the concept of a complex number full justice. However, the 
terms are quite convenient. In particular, the two conditions for the 
equality of complex numbers appear: 
Equation of real and imaginary parts: if x, +1y,=2%_.+ ty, then: 

H=2, and Y,=Yo. 
In conclusion, notice a property of ἡ: 
| i(a+ty) =e + y= —y tie. 

Hence, in Fig. 2.5d, if P is z=x+1y, y 
and Ὁ is iz=-—y+iz,thenOPand0g *® 
are at right angles. In an Argand Dia- 
gram, multiplication by ὁ corresponds 
to rotation through a right angle. For 
example: A (z=1), B(z=1), A’(z=?= 
—1) and B’ (z=13= -- ὃ) are so related. 

There are now three important in- 
terpretations of 1: _ 

(1) ὁ is a number such that 1?= — 1, a root of 2?+1=0 

(2) ὁ is the vector OB in the Argand Diagram, the unit vector at 

right angles to the ‘real’ axis Oz. 
(3) ‘multiplication by 2’ is equivalent to ‘rotation through a right 
angle’ in the Argand Diagram. 


-= soccer ae oh Ge a> ad 


2.6. Integers. We have jobbed forwards from rationals to real and 
complex numbers, keeping throughout to a system of numbers which 
is a field, satisfying all the required operational rules. Looking back, 
we may well be surprised at how much we have left undefined at the 


46 NUMBER SYSTEMS [2 


outset. In what sense are rationals subject to sums, products and 
ordering? It is time to job backwards and find out. 

The basic entities behind the rationals are the natural numbers, 
or positive integers, which we get essentially from counting on our 
fingers. Our starting point is the set J+ of all positive integers. We 
still have a choice. Hither we take ‘positive integer’ as a primitive 
(undefined) concept, subject to a set of axioms, and proceed to define 
sums, products and order in such a way that we derive the properties 
we desire for the set J+. How this may be done is shown in 15.1. Or 
we take the positive integers for granted, and simply specify the 
properties we assume they have. This is the line now followed. 

The set J+ of positive integers is {1, 2, 3, 4, ...}. Unspecified positive 
integers are denoted: m, n, }, 4, 7, .... Their properties, laid out care- 
fully and specifically, are a formidable batch indeed: 

(1) There are in J+ symmetrical operations of addition and 

multiplication, specified by addition and multiplication tables 
of which the essential parts are: 


+/ 12 3 4 ὃ 67 8 9 x|1l1 2 3 4 5 6 7 8 9 
1; 23 4 5 6 7 8 910 1/123 4 ὃ 6 7 8 9 
2) 3 4 5 6 7 8 91011 2/2 4 6 810 12 14 16 18 
3/4 5 6 7 8 91011 12 3/3 6 9 12 15 18 21 24 27 
4; 5 6 7 8 91011 12 18 4/4 8 12 16 20 24 28 32 36 
5| 6 7 8 91011 12 13 14 5 | 5 10 15 20 25 30 35 40 45 
6| 7 8 91011 12 13 14 15 6 | 6 12 18 24 30 36 42 48 54 
7; 8 91011 12 13 14 15 16 7 | 7 14 21 28 35 42 49 56 63 
81 910 11 12 13 14 15 16 17 8 | 8 16 24 32 40 48 56 64 72 
9/10 11 12 13 14 15 16 17 18 9/9 18 27 36 45 54 63 72 81 


These are the tables learnt, so painfully, at school. The 
symmetry is evident, since in each ‘matrix’ table the entries 
below the leading downward diagonal are the same as those 
above. 

(2) J+ is closed with respect to the commutative operations of addi- 
tion and multiplication, i.e. if m and nm belong to J+, then 
m+n=n+mand mn=nm belong to J+. 

(3) Addition and multiplication are both associative, i.e. if p, α 
and r belong to J+, then p + (¢+1r)=(p+¢@) +r and p (qr) =(pq)r. 

(4) Addition and multiplication are distributive, i.e. if p, ᾳ and r 
belong to J+, then p(g +7) =pq+pr and (p+q)r=pr-+qr. 


6] NUMBER SYSTEMS 47 


(5) J+ contains an identity for multiplication, i.e. unity 1 so that 
mx1l=1xm=m for any m of Jt. 
(6) Cancellation is valid, i.e. if m+p=m+q then p=q 
and if mp=mq then p=q. 
(7) If m and n are different members of J+ (mn), then the dof- 
ference of m and n can be defined as belonging to J+: 
Either m+p=n for some p in J+ so that the difference 


p=n-m. 
Or m=n+p for some p in J+ so that the difference 
p=m—-n. 


(8) If a set contains 1, and contains (n + 1) whenever it contains n, 
then it comprises all members of J+: the principle of mathe- 
matical induction. 

Apart from the first, defining sums and products, the properties 
effectively amount to the following. Properties (2)-(6) inclusive 
specify how many of the operational rules of 2.2 hold for positive 
integers. They all hold except: 

Addition: no zero and no negatives. 

Multiplication: though there is a unity, there are no reciprocals. 
Note that the commutative rule 3 is a consequence of the symmetrical 
definition of sums and products. Also, though rule 5 (inverses) does 
not hold, the weaker form of rule 5A does hold and cancellation is 
valid. Further, in the absence of inverses, the processes of subtraction 
and division do not apply; in general, it is useless to try to subtract 
or divide two positive integers. This is perfectly well-known: 5 pennies 
cannot be taken from someone with only 3. However, property (7) 
provides some substitute for subtraction, since m and ἢ have a 
difference for any m and n(m #7n). 

This leads to the last point, covered by the very powerful property 
(8): what is the order of the integers? Can we write them in sequence: 
1, 2, 3,...”, (n +1), ...2 We can, by induction from property (8). 
Starting with 1, suppose we have proceeded as far as n; then by 
induction the next integer is (n +1). The order of J* is: add unity 1 
to any integer to get the next in the series. This order implies that 
we always know which of two different integers is the earlier and 
which is the later in the sequence. Hence we define: m is less than 
n (m<n) and n is greater than m (n>m) as two equivalent ways of 
stating that m is earlier in the sequence of J+ than n. The difference 


48 NUMBER SYSTEMS [2 


property (7) now appears: m<n means that a positive integer p 
exists so that m +p =n, 1.6. the difference (n — m) is a positive integer. 

Hence, J+ is an ordered set, arranged in sequence 1, 2, 3,... ἢ, 
(7+1), .... On the other hand, J+ is a good deal short of satisfying 
all the operational rules of algebra; it is not a field. 

The principle of mathematical induction can be elaborated into a 
powerful method of proof. Mathematical induction is applied in the 
following form. A property P(n) is to be established, i.e. proved for 
all positive integers n. Then (i) check that it holds for n=1, 1.6. 
P(1) true; (ii) prove that, if it holds for n, then it holds for (n +1), 
le. if P(n) then P(n +1); finally (iii) say that, by induction, P (n) is 
true for all n. The last step simply means: since P(1) true, so is 
P (2); since P(2) true, so is P(3); and so on. This is like climbing a 
ladder: (i) get your foot on the bottom rung; (ii) see how to move a 
foot from one rung to the next; then (iii) you can climb up the ladder 
as far as you want. 

This simple concept not only provides a ready method of proof; 
it is also involved in our intuitive way of getting results. We guess 
what a property P(n) should be first by working out the simplest 
cases, 2 =1, 2, 3, ..., and then by having a shot at a generalisation. 
We prove P(n) by induction. For example: Find the sum of the first 
n odd integers, 1.6. P(n)=1+345+4+...+(2m-1). (i) Try n=1: 
P(1)=1; n=2: P(1)=14+3=4; n=3: P(3)=1+34+5=9. This 
suggests P(n)=n?, verified for n=1, 2, 3. (ii) If P(n)=n?, then 
P (n+1)=14+345+4+...+(2n-1)4+(2n4+ 1) =n? + (2n +1) = (n+1)?. 
(111) So, by induction, P(n) =n? for all positive integers n. 

Two steps now need to be taken, starting from J+, the positive 
integers. First, J+ must be extended to the wider set J of all integers, 
positive, zero and negative. Then J must itself be extended to the still 
wider set F of all rationals. We need spend little time on the first 
step,* which can be regarded as carrying the order of J+ backwards 
from 1 to 0, to —1, to — 2, ..., adding a new integer each time. There 
is no positive integer 0 such that n+0=0+n=n, so define a new 
number zero for the purpose. So, 0 + 1=1, i.e. 1 is the successor in the 
order to the new number 0. Next, define a new number ( — 1) so that 


* The formal development of this step is more complicated and it can best be 
expressed (like the other step) by a construction of pairs of positive integers. This is 
indicated in 15.1. 


6] NUMBER SYSTEMS 49 


(—1)+1=0, 1.6. 0 is the successor to the new number —1 and ( -- 1) 
is the negative of 1. Then, define a further number ( -- 2) so that 
(-2)+1=-1, 1.e. —1 is the successor to the new number -- 2. It 
also follows that (—2)+1+1=-—1+1, adding 1 to each side, i.e. 
—2+2=0 and ( -- 2) is the negative of 2. The process continues, in the 
order of the positive integers, producing the opposite series of nega- 
tive integers. The set J of all integers, still ordered by greater/less, 
is the result. It differs from J+ by having a zero and negatives, i.e. 
rules 4 and 5 for addition now hold. The only lack in J is that there 
are generally no reciprocals, and hence no division process. A set 
with this one defect (rule 5 for multiplication not valid, but rule 5A 
on cancellation valid) is called an integral domain. J is an integral 
domain, coming close to the requirements for a field, but failing to 
make it because of the lack of reciprocals. 

The second step, from the set of integers J to that of rationals R, 
corrects this; it fills in the gap by defining reciprocals and ratios of 
integers. The construction involved is that of number pairs and 
‘place-markers’ already used with success for complex numbers: 


DEFINITION: The set R of rational numbers comprises all ordered 
pairs of integers (p,q), where p and q range.over J, g%0. Sums and 
products are: 


(Pi, 41) + (Das 42) = (9192 +P 291, 4149) 

(Pi, 41) X (Das 44) = (Die, 414). 
As with complex numbers, the reason for choosing these particular 
addition and multiplication processes is seen by looking ahead to the 
final outcome. We have in mind to replace the integer pair (p, 4) by 
pPgq where P is a ‘place-marker’, which in its turn we wish to denote 
by + or /, to get the rational number p~q or p/g. The sum and pro- 
duct rules given achieve this, since we aim at: 


Pi, Po _ Prilet Po oq Pi « Pe _ Pibs 
71 42 W172 71 42 W192 
as in elementary algebra. It is interesting to note that the addition 
process is here the more complicated ; for complex numbers it is the 
product which is involved. 
As in the development of the notation x + iy for a complex number, 
the definition and rules for sums and products of rational numbers 
justify writing the place-marker P as ~ or /. A rational number is 


δ0 NUMBER SYSTEMS [2 


then p/q, where p and q are integers (4950), and all the operational 
rules of 2.2 are obeyed (see 2.9 Ex. 7). | 

The result is that we justify the original assumption that the set R 
of rationals is a field. It has been ‘manufactured’ from the simpler 
set J of integers (which is not a field since it lacks reciprocals and 
division) by a process which first pairs off integers (p, 4) and then 
associates them with quotients p/qg. R can be called the quotient field 
of J. It is a formalised version of what is done in practice in arith- 
metic. It is also a general process which can be applied in the con- 
struction of other fields, e.g. for polynomials in Chapter 3. 

The ordered property of the rationals R, as a reflection of the 
basic ordering of the integers J, has still to be established. This is 
best done initially for the set R+ of positive rationals obtained from 
the set J+ of positive integers, 1.6. R+ is that part of R which corre- 
sponds to J+ as part of J, and a positive rational is a pair (p, 4) of 
positive integers from J+. Duplication in R+ is first eliminated by 
amalgamating ‘equivalent’ rationals: | 


(Pi, %1)=(Pe 42) if Dig2=Po% 
which simply means p,/¢,=2/¢2. Having got rid of this complication, 
e.g. by confining (p, 4) or p/q to relatively prime p and q, we define: 


(Pi 1) <(Po, 42) If 42 ΦΡ:41 
a condition which depends only on the ordering of positive integers 
(here p,g, and p.q,). In terms of the quotient notation: 


Py Ps 

71 42 
The ordering of zero and negative rationals can then be dealt with. 
First, zero is less than all positive rationals; then, negative rationals 
are ordered by: 


_ Pa θι 
43 7, 
for any positive integers 1» Pz, ἦι and qo. 

One conceptual (but genuine) difficulty has been glossed over. 
Each extension, from the set J+, to the integral domain J, and to the 
fields R, R* and C, involves a widening of the scope of the number 
system, in such a way that each set is contained within the later ones. 
The positive integers of J+ are reproduced with others (zero and 


if 91¢2<og, by cross multiplication. 


«0 if pide < Pod 


a ae a ΨΤ΄'ὦ 


6, 7] NUMBER SYSTEMS 51 


negative integers) in J; the integers of J are reproduced with others 
(fractions p/q, g#0 or 1) in R; and so on. The difficulty is that one 
number appears in various disguises. The positive integer 3 in J+ is 
also - in J, +2in R, the real number 3 in R* and the complex 
number 3 in C. All we say here is that, though these are all different, 
they are essentially equivalent, so that we can switch from one to the 
other as necessary. There is, however, rather more to be said on the 
matter (see 7.4 below). 

The build-up of the number system can be shown in summary in 
the classificatory scheme: 


Rational Real Complex 
Integers J Numbers R Numbers R* Numbers C 


- Natural 
numbers 


integers 


rational 


fractions 


irrational 


negative 


2.7. Finite sets of integers. The set of positive integers is the number 
concept derived from counting on the fingers; but the derivation is 
not immediate. The set of positive integers, and the more developed 
sets of numbers, are all infinite sets. To one who uses only his fingers 
for counting, the infinite sequence of positive integers must be quite 
a sophisticated idea. He cannot have much idea of integers in 
thousands or millions, or even of the eleven or twelve times tables. 
He would start with 0 on the thumb of his left hand, and proceed 
1, 2, 3,... until he reaches 9 on the thumb of his right hand (or he 
might go from 1 to 10, much the same thing). But then he goes back 
to 0 again on the thumb of his left hand and starts a new cycle to 9. 
Suppose, however, he has been introduced to a clock and is keeping 
tabs on the hours. Looking at the clock face, he would tick off the 
hours from 0 to 1, 2, 3, ... until he comes to 11. He would then go 
back to 0 and start again. So, for example, in counting 78 sticks, he 
would go 7 complete rounds and find himself left with 8 on his hands. 
Equally, 78 hours after midnight on D-day, he would have 6 on the 


ς Α.Β.Μ, 


δ2 NUMBER SYSTEMS [2 


clock, the hour-hand having done 6 complete revolutions. If he 
counts in tens, the number which is really 78 would appear to him as 
8. If he counts in twelves, as in time-keeping, the same number 
would appear as 6. 

There is an important idea here. In a count on two hands, only the 
ten integers {0, 1, 2, ... 9} occur and each number is replaced by its 
residue or remainder on division by 10. Similarly, in time-keeping 
with a clock, every hour shows up as the remainder after division 
by 12, only the twelve integers {0, 1, 2,...11} being used. This 
suggests that we should investigate what happens if, instead of the 
whole set of integers (positive, zero and negative), we keep only the 
remainder on division by a selected positive integer 7: 

DEFINITION: The integers modulo n {0, 1, 2, ... (n—1)} (mod n) 
are the finite set obtained by keeping only the remainder on dividing an 
enteger by n. 

For example, ordinary counting is mod 10, time-keeping the hours is 
mod 12. 

Remember the properties of the integral domain J: all the opera- 
tional rules of algebra hold except that there are no reciprocals, but 
cancellation is valid: 

If mp=mq (m+40), then p=q. 
As one particular case (¢=0). the rule is: | 
If mp=0, then either m=0, or p=0, or both. 
There are no divisors of zero; no two (non-zero) integers multiply to 
zero. 

Our object is to see whether the finite set of integers (mod 7) does 
better or worse than J in respect of the operational rules obeyed. Start 
with the integers modulo 5: {0, 1, 2, 3, 4} (mod 5), corresponding to 
counting on one hand of five fingers, only the remainder on division 
by 5 being retained. The operations of addition and multiplication 
are, otherwise, exactly as in ordinary arithmetic. Consider 3 and 4 
(mod 5): 

3+4=7=1x5+2 replaced by 2 
3x4=12=2x5+4+2 again replaced by 2 
1.6. 3+4=2 and 3x4=2 (mod 5). 
Other examples are: 2+3=0; 1+4=0; 2x3=1;1x4=4, 


7] NUMBER SYSTEMS 53 


Hence we build up the addition and multiplication tables for integers 
(mod 5): 


For the arithmetic of integers (mod 5), nothing more is required; 
these tables define the operations of + and x. It remains to check 
all the operational rules of 2.2. In doing so, we cannot fail to notice 
one very helpful property of the tables: each integer appears exactly 
once in every row or column (ignoring 0 for multiplication). 

The definitions of + and x are symmetrical and the result is 
‘always an integer of the set {0, 1, 2, 3, 4} (mod 5), i.e. the tables are 
symmetrical about the leading (downward) diagonal and contain 
only the five integers. Hence the set is closed and commutative. It is 
also associative, e.g. 24+(3+4)=2+2=4 and (24+3)+4=044=4 
and similarly for products. The distributive law holds, e.g. 

2(8+4)=2x2=4 and 2x3+2x4=14+3=4. 


We must look very carefully for identities and inverses. For addition, 
the zero is 0, since an integer is unaltered by addition of 0. If p is an 
integer, its negative (-—~p) is such that »+(—p)=0. Hence, we look 
for 0 in any row of the addition table, and every row has a zero. So 
negatives exist: 

(-1)=4; (-2)=38; (-3)=2 and (-4)=1. 

These can be checked by the remainders on division by 5: 
(-1)=(-1)x54+4; (-2)=(-1)x5+3; (-3)=(-1)x542; 
(—4)=(-1)x5+1. 

With negatives defined, subtraction follows, e.g. 
| 2-4=2+4+(-4)=24+1=3. 
For multiplication, the unity is 1, since an integer is unaltered on 
multiplication by 1. The reciprocal p-! of any integer p is given by 
p xp -t=1(p40). We look for 1 in a row of the multiplication table, 
and every row has a 1. So: 
| Q-1—3; 3-1-2: 4-1=4 


54 NUMBER SYSTEMS [2 


together with the fact (always true) that 1 is its own reciprocal. 

Hence reciprocals exist and so does division to give p/q (¢0). For 

example: | 
$—3x4=3x2-7=3x3=4;, $=4x}$=4x24=4x3=2. 


Hence the integers (mod 5) have everything. Manipulation of the 
five integers appears strange at first, but it satisfies all the operational 
rules. The integers (mod 5) are a field. They are an improvement on 
the set of all integers; they have reciprocals and division (see 2.9 
Ex. 19). 

This may be an accidental result, connected with the fact that we 
have selected 5, a prime, to start with. It is necessary to try again. 
Consider the integers, modulo 4 and modulo 6. These would arise if 
we counted on one hand, with 4 or 6 fingers. The addition tables look 
very much like that above for mod 5, just one row and column more 
or less. The integers mod 4 or mod 6 are well-behaved for sums 
(2.9 Ex. 21). Itis the multiplication table that varies: 


{0, 1, 2, 3} (mod 4) {0, 1, 2, 3, 4, 5} (mod 6) 
012345 


The pattern of these tables is different from that for integers (mod 5). 
In particular, some rows do not contain the integer 1; instead they 
have extra zeros. The effect is to be seen in the definition of recipro- 
cals. Unity is still 1 and 1 is the reciprocal of itself. For integers 
(mod 4), 3-1=3 so that 3 is also the reciprocal of itself; but there is 
no integer which multiplies 2 to give 1, i.e. 2 has no reciprocal. Rule 
5 on reciprocals breaks down. Worse still, the weaker rule 5A also 
fails. It is seen that 2 x 2=0, i.e. 2 is a divisor of zero. This arises 
because 4=2 x2 is not a prime; when 4 is replaced by 0 (mod 4), 
then 2x 2=0. Equally, for integers (mod 6): 6=2 x3, and there are 
difficulties with 2, 3 and 4. From the table, 5-!=5 so that 5 (like 1) 
is the reciprocal of itself. But there are no reciprocals for 2, 3 or 4. 
Moreover these are divisors of zero: 2x 3=3 x 4=0. 


7, 8] NUMBER SYSTEMS 55 


The conclusion is that integers (mod 5) form a field; they are 
better behaved than J itself. On the other hand, integers (mod 4) 
and integers (mod 6) satisfy neither rule 5 nor rule 5A; they have 
zero divisors and are worse than J. The result is, in fact, quite 
general: integers (mod 7) are a field if is prime, but fail to satisfy 
rules 5 and 5A if n is not prime. 


2.8. The binary system. The simplest finite set is that comprising 
only the integers 0 and 1, the zero for addition being 0 and the unity 
for multiplication being 1. All that is needed for the set to be a field 
is that 1 should be both the negative and the reciprocal of itself. This 
is achieved with the set {0, 1} (mod 2). 

It is useful, however, to start more simply. In the set {0, 1}, assume 
that sums and products have the following three (very familiar) 
properties 

(i) Addition of 0 to any integer leaves it unchanged. 
(ii) Multiplication of any integer by 0 gives 0. 

(iii) Multiplication of any integer by 1 leaves it unchanged. 

The addition and multiplication tables for {0, 1} are then: 


and the only thing left open is the meaning of 1 - to fill the space 
marked *. There are several possibilities, still using only 0 and 1: 


(1) Specify 1+1=0 so that: 


The set is now {0, 1} (mod 2). Sums and products are the usual 
arithmetical ones, provided only that every integer is replaced by 
its remainder on division by 2, e.g. 1+1=2 replaced by 0. The 
set is a field, the simplest instance of the field {0, 1} (mod n), 
n prime. ee 


56 NUMBER SYSTEMS [2 


One application is in the treatment of even and odd integers, and 
so of other entities which can be described as even and odd. An even 
integer has 0, and an odd integer has 1, as remainder on division by 2. 
So all even integers appear as 0 and all odd integers as 1 (mod 2). The 
tables for sums and products then state: 

Addition: even +even=odd+odd=even and even+odd=odd. 

Multiplication: even x even =even x odd=even and 

odd x odd =odd. 
(2) Specify 1+ 1=1 so that: 


The set is now a different one and, since 1 has no negative, it is not a 
field. One interpretation which can be given to addition here is: 
p+q=larger of (p, 4). 
The difference as compared with ordinary arithmetic is that 1+1=1 
and so 
1+1+...(n times)=1 (and not n). 

An application of such a set appears in 4.4 below. 

(3) Allow now for numbers with two (or more) digits, each digit 


being either 0 or 1. For two digit numbers, 00 and 01 are merely 0 
and 1 in disguise, but 10 and 11 are new. Specify 1+ 1=10 so that: 


+/0 1 


1 
10 


Suppose, further, that the simple arithmetic of multi-digit numbers 
still applies. 


So, for addition: 10 10 10 ll 
] 10 11 11 
11 100 101 110 


where 1+1=10 and a ‘carry one’ process is required, e.g. in adding 


8] NUMBER SYSTEMS 57 
11 and 11, the right-hand 1’s add to 10, the 1 is carried forward to 
the next digit: 

| — 14141=14+(14+1)=14+10=11. 


For multiplication: 10 10 10 11 
1 10 11 Il 
10 100 10 11 
10 11 
110 1001 


with the same ‘carry one’ process. This is the binary system for hand- 
ling integers. 

Another approach makes the nature of the binary system clear. 
In the decimal system, any integer is written in terms of powers of 10. 
Reading the number from right to left, the first digit represents so 
many (0, 1, 2, ... or 9) units, the second so many 10’s, the third so 
many 100’s (100 -Ξ 103), and so on. For example: 


—147=1x 10?+4x104+7 (equals 7 in mod 10). 
n 
Generally: a,4,-1.-.0;%= δ᾽ a,10° 
r=0 


=4,10% + a,-,10°-1+...+a,10+a, 
where the a’s are from the set {0, 1, 2, ... 9}. 

In the binary system, 2 replaces 10. Any integer is written in terms 
of powers of 2 and the digits (multiples) are written 0 or 1. From 
right to left, the first digit is 0 or 1, the second is a multiple (0 or 1) 
of 2, the third a similar multiple of 22=4, and so on. Successive 
powers of 2 are: 


nil 23 4 5 6 7 8 4. 


2512 4 8 16 32 64 128 256 ... 


Hence 147 must start with 1 x 27=128 and remainder 19: 
147=1x274+19; 19=1x244+3; 3=1x2+1. 

So: 147=1 x 27+0 x 2640 x 2541 x 2440 x 2840 x 2241x241 

ie. 147=10010011 taking 8 digits altogether. 


ὅ8 NUMBER SYSTEMS [2 


The integers up to 32 appear in the binary notation: 


Decimal ‘Binary Decimal Binary| Decimal Binary | Decimal Binary 


] 
2 
3 
4 
5 
6 
7 
8 


1 9 1001 17 10001 25 11001 
10 10 1010 18 10010 26 11010 
11 11 1011 19 10011 27 11011 
100 12 1100 20 10100 28 11100 
101 13 1101 21 10101 29 11101 
110 14 1110 22 10110 30 11110 
111] 15 1111 29 10111 31 11111 
1000 16 10000 24 11000 32 100000 


In general: 6,6,-,...0,b)= Σ 6,2" 
r=0 
=6,2"+6,-;2"-14+...+b,2+b, 


where the b’s are from the set {0, 1}. 

As a check, multiplication of 11x11 gives 1001 in the binary 
system (as above). Here 11 is 3 and 1001 is 9; the product is 3 x 3=9. 

The binary system of denoting numbers has entered the popular 
domain since the introduction of high-speed computers. Electronic 
circuits are well-adapted to handling only two digits 0 or 1 (circuits 
open or closed). There are other possible systems, e.g. the duodecimal 
based on 12 and the octal based on 8. Numbers in octals are easily 
linked to the binary system used in computers (see 2.9 Ex. 25). 


2.9. Exercises 
1. Illustrate a difficulty with recurring decimals by adding 4/3=1.- 3 to 
5/3 =1-6 to give 3 =2-9. Is there any difference between 0-9 and 1? 


2. Rationals as decimals. Consider the rational 7 where p and gq are integers, 


q>1, p and q relatively prime. Show that ; is 8 terminating decimal if and 


only if g has only 2’s and 5’s as factors, and that the decimal recurs otherwise. 
Illustrate by writing 1/3, 1/7, 1/11 and 1/13 as recurring decimals. An approxi- 
mation to π is 22/7 =3-142857 and 355/113 is closer. Use 355/113 to illustrate 


the problem of finding whether a decimal recurs. 
3. Attempt to make subtraction a — ὃ a basic operation in the set of rationals 


and check that the operational rules of 2.2 do not all hold. In particular, show 


9] | NUMBER SYSTEMS 59 


that the associative rule fails: a —(b —c) #(a -- 6) —c. Illustrate the subsidiary 
nature of division a+b similarly. 

4. Show that the operation ‘to the power’ (αὐ =a to the power ὃ) is not 
associative in the set of rationals, i.e. a to the power (ὃ to the power 6) is not the 
same as (a to the power b) to the power c. See 1.9 Ex. 17. 

5. Pythagoras’ Theorem. Triangle ABC is right-angled at C; a, ὃ and ¢ are 
the lengths of the sides. Then c? =a? + b?. Given rational a and 6, to find c is 
equivalent to solving x? -k =0, where the rational k =a? + b?>0. Show that c 
is irrational except in such special cases as a =3, ὃ =4. 

6. R(./2) and R(2) as fields. The set of « =a + b,/2 (a and ὃ rationals) satisfies 
all the operational rules of 2.2 with + and x defined by (1) of 2.3. Check the 
rules for +, noting that zero is 0 (α =0 +0 x v2) and that -- α -- ( -- α)  ( -- ὃ) ψ2. 
For the rules for x, see 2.3. Check the distributive rule: «(8 +y) Ξεαβ +ay by 
substituting full expressions for «, 8 and y and amplifying both sides. eee 
show that the set of z =a +7b (a and ὃ real) 
is a field underthe + and x rules of (1) of 2.5. 

7. The field of rationals. Show that the set 

of rationals «=(p, 4), for p and gq integers, 
satisfies all the operational rules of 2.2 with 
+ and x defined as in 2.6. Note that zero 
is (0, 4) for any g 0 and that -- α ΞΞί -», 4); 
that unity is (p, p) for any p #0 and that 
a-?=(q, p). 

8. The irrational mr. The area of a circle 
of unit radius is 7, approximated (as in 
Euclid) by the area of inscribed and cir- 
cumscribed (regular) polygons. For penta- 
gons (Fig. 2.9), with OA=OR=OB=1, write: 


360° 


Area OAB=}340A .OB sin 4 AOB =} sin 
180°. 


Area OPQ =OR. PR=OR'’ tan ὁ POR =tan 


tl <a <3 sin _ and evaluate these bounds from 
trigonometric tables. Generalise and obtain a sequence of nested intervals 
ο ο 
[ n ton 180 - 1 ain 360 
9. Consider the set S of rationals x such that x? >2. In the field of rationals, 
S has a lower bound (e.g. ὦ =1). Suppose a (rational) is any lower bound, so 
that a?<2. Show that there is a larger rational b>a so that δὲ <2 (e.g. by 
expressing @? as a decimal, short of 2). Now look at the same set S of rationals, 
but in the field of real numbers. Then a =,/2 is such that no real number can 
be inserted between a and the rationals of S. Hence deduce that S has no 
rational GLB but that it has the real GLB v2. | 


C2 A.B.M. 


Hence show: 5 tan 


defining 7: aa for n=3, 4, 5, 6, 


60 NUMBER SYSTEMS [2 


10. Attempt to define 7 as dividing LZ and G in a cut of R, where L comprises 
rationals x such that x? + 1<0 and G rationals x such that x? + 1>0. ΠΟ that 
this fails because L is empty, i.e. ὁ is not real. 

11. In Fig. 2.56, show that P is (x, +22, ¥; +Y) where P, is (x,, y,) and P, is 
(Ye, Ye)» Deduce that z =2, +2, is given by OP. 

12. Show that the general complex number can be written 


2=r(cos @+7 sin @) 
where r is the absolute value and @ the argument (see Fig. 2.5a). Show that 
r=1 for each of the four points A, B, A’ and B’ of an Argand Diagram and 
that 9=0°, 90°, 180° and 270°, giving z=1, 7, -—l and -- ὦ respectively. 
13. Show that z, =7,(cos 6, +2 sin 6,) and z, =r,(cos 6, +7 sin @,) give: 


242 =111, {cos (0, + θ4) +2 sin (8, + θ4)}} 


using the addition formulae (Appendix A.7). Interpret in Fig. 2.5c. 
14, De Moivre’s Theorem. If z=r(cos 6 +7 sin 6), then 


55 =r? (cos 29 +7 sin 26). 
Generalise and deduce the theorem of de Moivre (1667-1764): 


(cos 6 +7 sin 0)" =cos n6+78in n@ (na positive integer). 

15. Show that the square of (1 +7) is 22 and deduce that μὲ =+(1+1)//2. 

*16. Powers of a complex variable. Consider z* where z=r(cos θ +7 sin 6) 
and a is a rational number. If a is an integer, show that z* =7r* (cos a@ +7 sin αθ) 
and write 1/z as a particular case. Illustrate the case of fractional a by showing 
that one value of ./z is /7r(cos $6 +7 sin $6). (Note: square up to 2.) Then write 
z=r{cos (n360° + 6) +7 sin (n360° + 0)} for any integer nm and show that 
another value of ./z is 

J/r{cos (180° + $6) +4 sin (180° + $8)} = — /r(cos $6 +2 sin 36). 
Check that these are the only values and that μὲ = (1 +74)/./2. 

17. Cube roots of unity. Write σι =4(—1+42,/3) and 2,=4$(—1-12,/3). Show 
that z? =z,, 22 =1; and that σῇ =z,, z3 =1. Deduce that z, and z, are cube 
roots of 1 (solutions of x? -- 1 =0). What is the third cube root? 

18. By mathematical induction, show that: 1+2+3+...+n=jn(n +1) 
and that 1+2+42?+...+2"-!=2"-1. Generalise the first by showing that 
4n{2a +(n —1)d} is the sum of n terms of an AP (first term a, common differ- 
ence d), and the second to give a(r" — 1)/(r — 1) as the sum of n terms of a GP 
(first term a, common ratio r#1). | 

19. Contrast the set {0, 1, 2, 3, 4} (mod 5) with the set {0, 1, 2, 3, ...} of all 
positive integers (including zero). The former as a field is closed under +, — 

x and +; the latter lacks differences and quotients. Establish that the 
integers (mod 5) have no primes, e.g. 1=2 x 3=4 x 4; that powers are defined, 
e.g. 22=4, 28=3, 24=1; that the only perfect square (apart from 0 and 1) is 
4=2x2=3x3. Deduce that the roots of «?=4 are x =2 or 3 in the field of 
integers (mod 5) and compare with the solution for all positive integers 
(x =2 only). | 


\ 


97 NUMBER SYSTEMS 7 61 


20. Construct + and x tables for the integers (mod 3) and show that the 
set is a field, similar to the integers (mod 5). 
21. Show that the addition tables for the integers (mod 4) and (mod 6) are 


+1/0 12 3 4 5 
0;0 12 3 4 & 
1/12 3 4 5 0 
212 3 4 5 0 1 
818 4 5 0 1 2 
41/45 01 2 8 
5/5 0 12 8 4 


with the same pattern as for the integers (mod 5) of 2.7. 

*22. Quadratics over the field of integers (mod 5). Consider the quadratic 
ax* + 6x +c where a, ὃ and ὁ are integers; it is known that az? +bx +c =0 has 
at most two roots in the set of integers. Illustrate that the same result holds if 
a, ὃ and c are integers (mod 5) by showing that | 

“5 4 τ-(α -- )(ς --4Ὶ and 274+27%4+2=(x -1)(x%-2) 
are unique factors, and hence that x?+4=0 has roots x=1, 4 and that 
x3 + 22 +2 =0 has roots x = 1, 2 (mod 5). | 

*23. Show that the result of Ex. 22 fails when a, ὃ and c are integers (mod 6) 
by checking that “5 +2 =a(x -- δ) -- (α -- 3) ω- 3) so that x*?+”=0 has four 
roots x =0, 2, 3, 5 (mod 6). 


n n 
24. Show that 0.a,a,...a, is & a,10-" in decimals and & a,2-" in binary. 
r=1 r=1 
Show that, in the binary system: 1/8 =0:001, 1/4=0-01, 3/8 =0-011, 1/2 =0-1, 
5/8 =0-101, 3/4 =0-11 and 7/8 =0-111. 
25. Octal system. Count in powers of 8 instead of 2 (binary) or 10 (decimal) 
and show that any integer appears as: 


n 
b,, b,-1---b1 by = = ὃ, 87 
r=0 
where the b’s are from {0, 1, 2, ... 7}. Check that: 


Binary Octal Decimal 
100010 . 42 34 
111010 72 58 

100001010 412 266 


are three integers in alternative forms. Notice that 4 is 100 and 2 is 010 in 
binary and hence that 42 in octal becomes 100010 in binary. Devise a simple 
translation from octal to binary, each digit in octal becoming a set of three 
digits in binary. 


CHAPTER 3 


POLYNOMIALS 


3.1. The fundamental theorem of arithmetic. A schoolboy knows 
that any integer can be factorised into primes. He also knows that the 
‘highest common factor’ or H.C.F. of two integers is to be got by com- 
paring their factors and by picking out the common ones. For example: 
140=2 x 70=2x2x35=2?x5x7 
1155 =3 x 385=3 x5x77=3x5x7x τ eae went 
The usual method adopted is to ‘fish around’ for factors, i.e. inspec- 
tion for divisibility by 2, 2, ...; then by 3, 32, ..., by 5, 5%, ... and so 
on through the primes. This is altogether too slow for larger numbers. 
Surely there is a more systematic approach? More basically im- 
portant: why is it assumed that any factorisation achieved is unique? 
It may well be, but surely a rooms is needed? A systematic develop- 
ment proceeds as follows. ? 

In the set J+ of positive integers, 1 ΓΝ unique properties; it is the 
identity for multiplication, it has no factors itself and it does not 
affect the factors of any other integer. Consider the set J+ (n> 1) 
apart from 1. Let m and n (m<n) be any two integers and divide 
m into n to give a quotient 4; and a remainder r,. Both 4; and r, are 
integers and r,<m, except that r,=0 is possible (in which case the 
division process is complete). Suppose r,40 and proceed by dividing 
γι into m to give quotient q, and remainder r,<r,. Suppose 7,0, 
divide r, into r, to give quotient q,; and remainder r,<r,. This 
process continues until a remainder is found equal to zero (division 
process complete). This must happen sooner or later since the 
integral remainders are decreasing (r;>r,>7r,>...). Let r, be the 
last non-zero remainder. It is the H.C.F. of m and n since, jobbing 
backwards, 7; divides r;,_,, divides ry_,, divides r,-3, ... divides 7, 
divides m, and divides n. More formally: 

N=QyM+7y; M=N +e M=Jslo ἜΤΙ νον 
Mee Ue e-a tes Tei = Weta" κ' 


1] POLYNOMIALS 63 


So: 1 =P ee — Ul kn =e — Ve (Te-3 -- Ve-1" k-2) 
= ~Qyly-g t+ (1 +949 e—-1)0 e-2 


(integer)m + (integer)n eventually. 


The integers in the last line are positive or negative. 
This process of repeated division is the Division Algorithm, or 
Euclid’s Algorithm.* It leads to the result: 


ΤΉΒΟΒΕΜ: If c is the H.C.F. of two positive integers m and n, then 
there exist positive or negative integers ἃ and μ so that | 


NM 4 [LT HO ccc cccncscesecssenceenenaeaceoes (1) 
Example: 1155=8x140+35 | Ἢ Oe of 1155 and 140 is 35 
140=4 x 35 
and 35=1155-8x140 1.6. (1) with A=1 and p= —8. 


Continuing, we say that two positive integers m and n are relatively 
prime if they have no common factors (except 1). Their H.C.F. is 1, 
and (1) becomes: 

If m and n are relatively prime, then A and μ exist so that 


Further, a positive integer p is prime if it has no factors other than 1 
and » itself. A prime p is also relatively prime to all integers except 1 
and multiples of Ὁ. Two results can now be obtained in succession, 
leading to the fundamental result on factorisation. 


ΤΉΞΒΟΒΕΜ: If p is prime and divides mn, where m and n are two 
positive integers, then either p divides m, or p divides n, or both.........(3) 
Though this is a result used automatically in elementary arithmetic, 
it still needs proof. If p does divide m, we need go no further. If p 
does not divide m, then p and m are relatively prime and, by (2), A 
and p exist for: 

Ap+pm=1. 
So Apn +pmn =n. 


But p divides mn (given) and it divides pn; hence it divides n. 
Q.E.D. 


* ‘Algorithm’ is a rule for computation, a modern mis-spelling of ‘algorism’. The 
Latin ‘algorismus’ derives from the surname of an Arab mathematician. 


64. POLYNOMIALS [3 
ΤΗΒΟΒΕΜ: Every positive integer n can be jactorised into primes: 


N=PyPo --» Pi FOP SOME ἃ.....«ννννννννννννννννον (4) 
If n is prime, it is itself the only prime factor. If n is not prime, then 
it has factors (other than 1 and n itself): n =n, where n, and n, are 
both less than n. Now take n, and n, in turn and proceed with 
factorisation. The process stops when only primes are left. This must 
happen sooner or later since the integers are getting smaller at each 
step. Q.E.D. 
It remains to show that the factorisation (4) is unique. Suppose 
that: 
N=P1Pz +. Di=NJ2 ... 4; (p’s and q’s prime). 
Then 7, divides n and so divides G92 --. Gi By (3), p, divides at least 
one of the q’s and (by suitable shuffling) it can be taken that ρι 
divides q,. But p, and q, are primes, so: Pi=M,. Divide out p,=¢q, 
and proceed to treat: 


PoPs --- Pi=TM2Is «++ 4; 
in the same way. In the end, each p is identified with a 4 until all are 
taken. Hence, i=j and the p’s and q’s are the same set of primes. 
Hence: 


FUNDAMENTAL THEOREM OF ARITHMETIC: Every positive integer 
n(>1) can be uniquely factorised into a product of primes: 


N=P1P2 ... Di- 

In this result, the integer 1 is excluded since it divides every 
integer, i.e. it is the unit of the set of positive integers. Nothing is 
changed in n=p,p, ... p, ifn or any of the unique factors is multiplied 
by 1. More generally for a set S: 


Derinition: A unit of S is an element of S which divides (1s a factor 
of) every element of S. | 
So, just as 1 is the only unit of the set of positive integers, 2 is the 
only unit of the set of even positive integers. There may be more than 
one unit, e.g. the set of all integers (positive and negative) has two 
units +l and -1. 


3.2. Gaussian integers. As a digression of some interest, consider the 
field C of complex numbers z+ iy, where x and y are real numbers. 
Various subsets of C can be taken, e.g. @+1b where a and ὃ are 


2] POLYNOMIALS 65 


rationals. One interesting subset is that of m +1 where m and n are 
_ integers from J. The field C is represented by all points in the plane 
Oxy on an Argand Diagram; the subset consists of the points (m, 1), 
where m and n are integers, forming a lattice (or trellis-work) over 
the plane as shown in Fig. 3.2. As in 2.3, the subset of ην- is 
obtained from the integral domain J by adjunction of the single 
element 7. It can be denoted J (2), the set 
of Gaussian Integers, after Gauss (1777- 
1855). J(i) is an integral domain like J, 
as is easily established (3.9 Ex. 2). Allthe 5 9 


No 


coe came Gata om 
Φ 
Θ 
Φ 


operational rules apply, except for recip- 5 9 

rocals and division, but including can- 0009 — 0 — 0 — 0 

cellation. a ἃ ὦ | ὡ- ὁ ἃ 
Let us attempt to extend the idea of «5 «© « | ee δ; ὦ 

prime factorisation to J(i). The funda- 2 Ι gon ae ae 

mental theorem shows that this is all right i 


(unique) for positive integers, and for all Fic. 3.2 
(positive and negative) integers only a 
little amendment is needed. Here — 1 is a unit as well as 1 and both 
are ignored in factorising. The process is still unique since any 
negative integer can be handled: 
—-5=(-1)5=1(-1)5 
and ~6=(-1)23=1(-1)23 
as for positive integers: 
5=15=(-1)(-1)5 
and 6=123=(-1)(-1) 23. 
Apart from the units 1 and (- 1), factorisation is unique. However, 
Gaussian integers m+in seem to raise further difficulties. To illus- 
trate, take the prime 5: 
5 = (2 +7)(2 -- ἡ =(1 4+ 22) (1 -- 20) 
=(-2-%)(-2+47)=(-1-—2¢)(-—1+21). 
In J (i), it would appear that 5 is not prime, and that factorisation is 
not unique. The first thought is correct; 5=5+720 can be split into 
factors like (2 +7) (2 -- ὁ) in J (ὁ). The second, however, is not correct. 
Factorisation is still unique in J (i). The point is that there are four 
units which divide every Gaussian integer; they are 1, — 1,1, —7. So: 
(2+%); (—2-1)=(-1)(2 +2); (— 14 24) =i (2 +2); (1 - 2ὴ = -2(2 47) 


66 POLYNOMIALS [3 


are all essentially the same, i.e. (2+7). Hence, apart from the units 
(+1, +2), the factorisation of 5 is unique: 5 =(2 +7) (2 -- ἡ). 

The Gaussian integers have four units (+1, +7) but, with this 
convention, factorisation into primes is unique. The primes in the 
Gaussian integers J (i) can be different from those in the integers J. 
Some are the same, e.g. 3 is a prime in both. Some are different, e.g. 
5 is prime in J but not in J (i), whereas (2+7) is prime in J (i) but 
doesn’t appear in J. The result has been illustrated here, not proved 
strictly ; but the proof i is very similar to that developed for integers 
in 3.1 above. ‘i 


3.3. Polynomials. A boy studying school algebra would not hesitate 
to say what he means by a ‘polynomial’. He might describe it as an 
expression such-as the quadratic 27? — x — 3 or the cubic 2° — 322 +4. 
Or, if he is a little more pedantic, he might say that the word does 
double duty, as an adjective and as a noun, so that 272-2 -- 3 or 

— 3x? + 4 is a polynomial expression, or more shortly a polynomial. 
If asked to pursue the topic, he might well say that, in application, 
polynomials are usually equated to zero to give a polynomial 
equation such as 27? -- x —-3=0 or 23 — 3z2+4=0. The problem is to 
solve the equation, to find its roots. One way is to factorise the 
polynomial and deduce the roots. For example: 22? -- x — 3 has factors 
(%+1)(2%-~3) and the equation 2. --α -- ὃ-- Ο has roots —1 and 3 
Again: x? — 322+4=(2+1)(x—2)? so that x3 —32?+4=0 has roots 
~ 1 and 2 (twice). Conversely, if the roots of the equation are found, 
the factors of the polynomial follow. For example, the roots of 
a?+¢+1=0 are —4$(1+%./3) by the well-known formula for the 
quadratic (Appendix A.3), and z?+2 +1 has factors 


(στε τ ἢ (τεΆ-ἰ e). 


This is, in fact, something of a mess. There are two different uses 
of ‘x’ according as we deal with the polynomial expression (function ) 
or with the polynomial equation. In a polynomial function such as 
y =2° — 322+ 4, we have x as a variable with a domain to be specified. 
For example, if the domain comprises all real numbers, the function 
fixes a real number y to correspond to each real number x we care to 


*See Birkhoff and MacLane: A Survey of Modern Algebra (Macmillan, N.Y., 
Revised Edition, 1953), pp. 413-16. 


3] POLYNOMIALS 67 


select. In a polynomial equation such as x3 —3x?+4=0, we are in- 
terested in the solution set, with one or more values of x which are 
the roots and which can be regarded as‘fixed values to be determined. 
Here there is some lack of precision on what number system is in 
mind for x. An opportunist attitude is often taken, x being allowed 
to be rational, real or complex according to what turns up. There is 
also some uncertainty on what numbers we can use for the coeffi- 
cients in the polynomial. Does it have integral, rational, real or 
complex coefficients? We need to say which. This is particularly 
important in handling the parametric form, e.g. αὐ -Ἐ δα -ἘΟ as the 
general quadratic or axz?+bxz*+cx+d as the general cubic. The 
replacement set of the parameters a, b, c, ... must be specified. 

We have to dig very deep indeed to get down to a good foundation. 
It is worth while making an effort here because of the insight gained 
into the meaning of polynomials in relation to the number systems 
used. A remarkable fact appears: polynomials are very like integers. 
The parallel is almost exact. In the end, a fundamental theorem of 
algebra emerges to match that of arithmetic. 

The difficulty is to define a polynomial without begging any 
questions. Reverse the order of the terms (for the moment) and 
write a+ba+cxz?+.... What is αἱ In view of what we have said, we 
hedge: leave x undefined. We then concentrate on the coefficients and 
say that a polynomial is just a set of coefficients (a, b, c, ...). The order 
of the coefficients matters; it must not be disturbed. We want, for 
example, (a, ὃ, c) to mean a + bx + cx? but (a, c, δ) to mean a +ca + bz?. 
We have not said how many coefficients there are. However, we must 
agree to take only a finite number of non-zero coefficients. We can 
agree, further, to ignore any zero coefficients which follow the last 
non-zero one. It does not matter whether we write them or not, 
except to make clear which is the last non-zero coefficient. 

The replacement set of the coefficients can be any number system. 
To have something specific to talk about, take the replacement set 
as the system of rationals. Hence, for illustration, we consider 
polynomials with rational coefficients. 

Our definition of a polynomial is an ordered set of rational co- 
efficients containing only a finite number of non-zero entries. Rules 
for sums and products are to be laid down. The rule for sums is easy: 

(a,, 01, Cy, ...) + (Ge, Be, Cg, ...}) = (Ay + Ge, 6, + ὃς, δὲ - 5, ...} «νον. (1) 


68 POLYNOMIALS | [3 


The rule for products is not so obvious. For sets of three coefficients 
(the quadratic case), we can write: 
(a, b,, C) Χ (α;, be, C2) 

= (αχα., Ab. + 9b1, AC, Ἔ διὃ. Ὁ α461, δι. + beC,, σι64)...(2) 
and other such rules can be written for other cases. It all seems odd 
and arbitrary, but we are, in fact, just looking ahead. 

To bring in ‘x’, we take as our guide the use of ‘+12’ as a place- 
marker in a complex number, except that we plan to use several 
place-markers: ‘+2’, ‘+2, ‘+2’, .... As a notation, write 

(a, b,c, ...)=a(4+x)b(+22)c...=a+bu+cexu?+.... 
The rules (1) and (2) for sums and products begin to look familiar: 
(a, +6,% +¢,"?7 +...) + (Ag+ bot +¢,47 +...) 
= (A, +2) + (ὃ. Ὁ δ). + (C+ C2)a? +... 

(a, τ δια + 0.x") x (a, +b,e + Cyr?) 

= Aye + (ακῦς + 2b1)X + (αχο5 + bb, + 420, )u? + (δχος, + ὃ201)25 + €,C,24. 
Hence, the rules are seen to correspond to the ordinary operations of 
algebra; the use of place-markers (+7, +2, ...), with x undefined, 
is vindicated. It is then found that the set of all polynomials obeys 
all the operational rules for sums (and differences) and all those for 
products, except that there are no reciprocals (and no division). In 
the set of polynomials, we add, subtract and multiply according to 
familiar rules; we are not, as yet, interested in dividing one poly- 
nomial by another. Polynomials form an integral domain with the 
same properties as the set of integers. A formal development is given 
in 15.2. 

In taking stock, we can make a series of observations: 

(i) There can be many zero coefficients in particular polynomials. 
For example: | 

(0, 0, ...)=0+0%+...=0 the zero of the set of polynomials 

(1,0, ...)=140r+...=1 the unity of the set of polynomials. 


Further, (a, 0, ...)=a@ (any rational); the set of polynomials includes 
all rationals. 
(11) The undefined x, x?, x3, ... are to be interpreted: 
(0, 1, 0, ...)=0+174+02?+...=2 
(0, 0, 1, 0, ...)Ξ50 05 - 1v?+ 0734+... =2? 
(0, 0, 0, 1, 0, ...) =O + Oa + Ox? + 1x34 Ort +... τε 8... 


9] POLYNOMIALS 69 


By the product rule (2): (0, 1, 0) x (0, 1, 0)=(0, 0,1, 0,0) which 
translates into x x z=2?. This and similar results establish 22, 2, ... 
as powers of x. Hence, the undefined x and all its integral powers are 
themselves polynomials. 

(iii) The place-marker notation is fully justified. Not only are 
x, αἢ, x°,... polynomials in themselves, but the + sign means 
addition. We can operate with polynomials on familiar lines. For 
example: 

(1 —a + 2x7) (1 +2 +247) =1+2 4 22? 
πα-- 2-248 
+ 2x? + 2x3 +- 4ar4 


ΞΞῚ 8.5 Ἐ 4.34. 

(iv) In our illustrative case, the coefficients are rational numbers. 
More strictly, they are a finite sequence of rationals and, if the last 
non-zero item is the coefficient of x”, then n is an integer (n>0) 
called the degree of the polynomial. So: 


f(@)=a4+bu+cex?+...+ham-*+ka" (k40) 


is the general polynomial of degree ἢ. Here n is a given integer 
(n>0) and a, b,c, ... k are rationals. A polynomial of zero degree is a 
rational number a and the set of all polynomials includes the field of 
rationals. We may sometimes require a polynomial to be of positive 
degree (n>0). 

(v) A polynomial is not itself a rational number, or indeed a 
number at all. In writing f(x)=a+bx+caz?+..., we specify a set of 
coefficients (a, ὃ, c, ...) and leave x undefined. It is to be distinguished 
from: . 


S(e)=a+be+ca?+... for a given rational a. 


This 1s a rational number; f(a) is obtained from «, a,b,c, ... by 
algebraic processes. Further, f(«) can be extended to a real (or 
complex) number by simply making « a real (or complex) number. 
In the polynomial f(x), x is undefined; in the value f(«), « is some 
specified rational, real or complex number. | 
To get a simpler notation, reverse again the order of the terms and 
write coefficients with subscripts. The general polynomial of degree 

n is: 
f(x) =f" +fr αι ἀπ (n>0, f,#0) ....... (3) 


70 POLYNOMIALS [3 


where the f’s are rationals. The leading coefficient can be factored 
out: 


f n—-1 f 1 f ὴ 
x) = fal a™ + SR aml t+ 0. + L+ 
fle) = tole 4 ΠΕ 
which is a case of the process known as scalar multiplication (see 15.2, 
and 3.9 Ex. 7). If we ignore the factor f,,, we can write (3) in alterna- 


tive form: 

f(@) =a" +, απτι εις αὐτὰρ (N0) oor... (4) 
where the a’s are still rationals. The set of all polynomials, given by 
(3) or (4), is an integral domain, subject to all the operational rules 
with the sole exception that reciprocals are lacking. 

The limitation of polynomials to those with rational coefficients is 
adopted here for purposes of illustration. More generally, the 
coefficients can be from any field F' we care to specify: 


DEFINITION: Given a field F, a set of ordered coefficients from F 
containing a finite number of non-zero elements is a polynomial over 
the field F denoted f(x) =f," +f,-:a"1+...4+fiu+fy where x is un- 
defined, where f, (r=0, 1, 2, ... n) ts an element of F and where n>0. 


The set of polynomials over F is an integral domain, denoted F[z]. 
Usually F is the field of rational, real or complex numbers; but there 
are other possibilities as illustrated in 3.9 Ex. 11. 


3.4. Rational fractions. The set F[x] of polynomials f(x) over a field 
F is not the end of this line of development. To complete, we have 
the analogy of the set of integers. The integral domain J is made into 
the field καὶ of rationals by the process of forming the ‘quotient field’ 
(2.6). Ordered pairs (p, 4) of integers are written, sums and products 
defined, and the pairs denoted p/q (q3<0). The exposition can be 
repeated word for word, substituting ‘polynomials’ for ‘integers’. 
Take two polynomials f(x) and g (a), write the ordered pair {f (x), g (x)}, 
define appropriate sums and products and identify the pair as the 
rational fraction: 


F(t) αι τε ἐς 
G(X) mE + JE" +... Ἐ σγ Ἔ ΘΟ 
The set of all such rational fractions is then found to satisfy all the 
operational rules (see 15.2). It is a field: 


where g (x) #0. 


4, δ] POLYNOMIALS 71 


DEFINITION : Given the integral domain F[x] of polynomials over a 
field Κ᾽, a rational fraction is the ratio of two polynomials of F[x], 1.6. 
S(x)/g (x) where g(x)#0. The set of all rational fractions is a field, 
denoted F (x). 


The field F(x) includes the integral domain F[a], the special cases 
where g(x)=1; it also includes F itself, the special cases where 
f(x) =fo, g(x) =1. 

Another way of looking at the set F(x) is as the field obtained by 
the adjunction of z (and hence of all its powers) to the number field 
F with which we start. Again x is undefined, any additional element. 
This is the general process of adjunction (see 15.2). 

_ Consider a particular case, the adjunction of the element x =i to 
the field of real numbers. The result is the field of rational fractions 
in ὁ with real coefficients. Things are made simpler here by the fact 
that «2=712=-—1, and hence that 2?=73= -7, 24=74=1,.... The 
rational fraction reduces to the ratio of fy+7if, to gy +1g,, and this 
further reduces to the form ὦ - ἰδ (by multiplying numerator and 
denominator by 90 --ἴ9ι). Hence, the complex number a+ib is 
obtained as a particular rational fraction, the adjunction of 2 =7 to 
the field of real numbers. 

We may proceed with polynomials and their ratios according to 
the familiar algebraic processes. Consider the rational fraction 
(x? + 25 —1)/(z? +1). Divide: 


x?+1)22?4+27-1(1 
x +1 
2a — 2 
giving quotient 1 and remainder 2(x# — 1). Hence: 


u*#+ 24-1 x—I 


eet+1l z2+1 


by a process similar to that of reducing a rational to its lowest terms. 


3.5. Polynomial functions. In the polynomial f(x) of the set F[x], we 
have taken x as undefined. We know only that x and all its powers 
are themselves polynomials of F[z]. We are entitled, however, to 
substitute anything we like for x and to follow through the calcula- 


72 POLYNOMIALS [3 


tions indicated by f(x) =f,7" +f,-12" 1+...+f,0+f 9. The results so 
obtained are now to be investigated. 

Suppose that f(x)=f,7"+f,-.2"1+...t+fi¢+fy is a polynomial 
over the field of rationals. This does not mean that f(x) is itself a 
rational. It is not any kind of number. It is, by definition, simply a 
sequence of (rational) coefficients. Suppose, however, that we sub- 
stitute for x a rational «. Instead of f(x), we have 


Fla) =Jfne® +fy—-10"* Ἐν, +fiet+fo- 

The polynomial f(x) goes; in its place, we have f(«) which is a rational 
number. f(x) is obtained by arithmetical processes operating on 
rationals, « and all the f’s. It is essential to keep the two things 
separate: f(x) as a polynomial with rational coefficients and un- 
defined x; f(«) a rational number obtained from a given rational 
number «. The same rational coefficients appear in f(x) and f(a); 
this is the only link. So, if: 


f(a) =2?-gr4+1 
then: f(%) oe and f(1)=1?-$1+1=-4. 
Similarly: f(3)= —3; f(2)=0; f($) =1; f(3)=$; ... 


It is clear, in terms of elementary algebra, what we are doing here. 
We are making the calculations necessary for plotting the ‘function’ : 
y=? —-3ar+l. 

The process of writing and graphing polynomial functions is, in 
fact, justified on the following lines. f(z) is a given polynomial. 
Replace x by any rational « from the field R. The rational value 
f(«) is obtained ; write it β: 

B=f(«) a function of « over the field κ᾽ of rationals. 


More usually, « and β are written as x and y. This is in order, as long 
as we remember that x and y are rationals, that x is no longer the 
undefined x of a polynomial. Hence the polynomial function: 


y =f (x) Ξε [,ῶ" Ἐ7}»-ἀὐπ ιᾳ T. “ Ἐπ +fo 
for x in the domain of all rationals. 


With this achieved, another step is easily made. Replace zx in the 
polynomial f(x) by a real number «. Then 


f (2) πῇ, απ Ἐκ απ T+... tfratfo 


δ] POLYNOMIALS 73 


is also a real number, in the field R*. Hence, the same polynomial 
function y =f (x) can be written, except that it is defined for x in the 
domain of all real numbers. In drawing a graph of the function, we 
actually work with z as a rational. But, when we draw a smooth 
curve through the plotted points, we are implicitly taking 2 as a real 
number (see 1.4 above). 

In the same way, the rational fraction function: 


= f(x) = fot" + fran t+... +fittfo 


= Pe 0 
g (a) σι" + Jm—1e" 1+... + G1 +9Go υ ) 


is defined over the domain either of all rationals or of all real numbers. 
For example: | 

(i) y = (a? + 2a —1)/(z? +1) for x in the domain of all real numbers. 
The graph (Fig. 3.5) can be plotted from a selection of rational 
values of x and the corresponding values of y: 


Fia. 3.5 


As will be seen later (Chapter 9), the ‘limit’ of y is 1, as ἃ increases 
indefinitely in each direction. 

(ii) y = (a3 — 1)/(a — 1) for x in the domain of all real numbers (2 τὸ 1). 
Here, division of x-1 into «-- 1 gives (x?-1)/(x-1)=2?+a2+1. 
The polynomial z?+2+1 is defined for all real x and the rational 
fraction (x3 — 1){(ἡ -- 1) for all real x except x =1; the two are equiva- 
lent except at x=1. (See 3.9 Ex. 9.) 


74 POLYNOMIALS [3 


The concept of a function will be introduced in a wider context in 
Chapter 7 and, from Chapter 9 onwards, we shall be largely concerned 
with functions f(x) of a real variable x, i.e. for x in the domain of all 
real numbers. We shall then have polynomials or rational fractions 
ready to hand as illustrations. 

Meanwhile, one further extension can be made; it suggests itself in 
the present context. In the polynomial f(z) or the rational fraction 
f(x)/g (x), replace x by a complex number «. 'Then 

f(x) Ξ [κα + frre" +... +fiatfo 

is itself a complex number, and so is f(«)/g(«). We are thus lead to 
the concept of a complex function, i.e. the function B =f («) or f («)/g («) 
over the field C of complex numbers. The usual notation for a com- 
plex number is z. Hence we can write the polynomial function f(z), 
or the rational fraction function f(z)/g(z), for z in the domain of all 
complex numbers. The idea of a function of a complex variable is 
generally regarded as difficult or advanced. This is nonsense. The 
concept of f(z) as a function of a complex variable is no more difficult 
than that of f(z) as a function of a real variable. The difference lies 
in the field of numbers over which the functions are defined and hence 
in the varying techniques for handling them (see 3.9 Ex. 12). 

This line of development will be taken up later. At the moment, 
we have more urgent business on hand. 


3.6. Roots of polynomial equations. To pass from a polynomial 
f(x) to a polynomial equation and its roots is not quite as simple as 
it appears. It is not just a matter of writing f(x) =0 and looking for 
“=a as a root. This is over-working x. Strictly, f(x)=0 is the zero 
polynomial, all coefficients zero; it is nof an equation. However, the 
correct procedure is ready to hand, using the concepts of 3.5. A zero 
α of a given polynomial f(x) is a number ἃ such that the number 
f(«) is zero. Consider a polynomial of positive degree* defined over 
the field of rationals: 

DEFINITION: The polynomial f (x) =x" + Qy-\U"-1 +... +0,% +, with 
rational coefficients and of degree n>0 has a zero « if f(«)=9. 
If we then care to say that the polynomial equation f(x)=0 has a 

* A polynomial of zero degree f(x) =f, has no zeros; the question does not arise. 


For 7 (x) of positive degree (n>0), we write the leading coefficient unity, as in (1) Ὁ of 
3.3, since the removal of a constant factor does not influence zeros. 


6] POLYNOMIALS 75 


root x=«, this is in order, but we must remember that we are using 
short-hand. In a graphical representation (for real x) a zero of y =f (2) 
occurs where the corresponding curve crosses Ox, and it gives a root 
of the equation f(x) =0. 


The definition of a zero or root is completely neutral as to what kind 
of number a is. It can be from the field of rationals R, real numbers R* 
or complex numbers C according to our choice. The function y =f (z) 
can be defined over any one of these fields. All that we have specified 
is that f(x) is a polynomial (of degree n> 0) with rational coefficients. 

For the moment suppose « is rational. The following results, much 
used in elementary algebra, are proved very simply: 


REMAINDER THEOREM: If the polynomial f(x) of degree n>0, and 
with rational coefficients, is divided by x — α, then the remainder is f(a), 
t.e. f(x) Ξε (ὦ —«)g(x)+f(«) for some polynomial g(x) of degree n —1. 
Proof: let R be the remainder so that f(x) =(x-«)g(x)+R for any 
rational x. Put x=« so that f(«)=0xg(a)+R, ic. R=f(«). 

Q.E.D. 
As a corollary, it follows that: 


THEOREM: The polynomial f(x) of degree n>0, and with rational 
coefficients, has a zero « if and only if x — « divides f(x), 1.6. of and only 
of f (w)=(x—-«)g(x) for some polynomial g(x) of degree n—-1. 


Proof: directly, if x -- α divides f(x), the remainder R=f(a)=0 and 
α is a zero of f(x). Conversely, if « is a zero of f(x), then f(«)=0 and 
f(z) =(% —a)g (x), 1.6. x -- a divides f(x). Q.E.D. 

Notice that this result merely writes g(x) as some polynomial of 
degree (ἢ -- 1). Nothing is implied about the rational number 9 (α). 
If g(a«)40, then (x—«) is not a factor of g(x) and f(x) has only one 
factor (1 -- α). Here « is a single root of f(z)=0. If g(«)=0, then 
(x -- α) is a factor of g(x) and f(x) has two or more factors (% — αὐ). 
Here « is a multiple root of f(x) =0. (See 3.9 Ex. 13.) 

As long as we stick to « as a rational, there is no implication that 
f(x)=0 has a root at all, or (if it has one) that there are further 
roots arising from the quotient polynomial g(x). Certainly, there is 
no implication that f(xz)=0 has n rational roots. We can, however, 
get out of this situation. 

Suppose now that « is real. We can look for a real zero of f(z), i.e. 


76 POLYNOMIALS [3 


a real α so that the real number f(«)—0. The theorems above sti 
hold, and the proofs are formally unchanged. There is, however, a 
subtle difference. If « is a real zero of f(x), then (a -- α) divides f (a): 


F(x) = ὦ -- αὴσ (x) 

where g(x) is a ‘polynomial’ of degree (n -- 1). The difference is that, 
whereas, f(x) has only rational coefficients, (z—«) and hence g (x) 
have coefficients from the wider field of real numbers. In short, g (2) 
is not a polynomial from the set F[x] of polynomials with rational 
coefficients. Irrational coefficients like /2 can appear both in (a -- «) 
and in g(x). Indeed, if « is irrational, then g (x) must contain irrational 
coefficients. 

It is clear, from what we know about the quadratic, that even real 
values of « do not exhaust the possibilities. We look further, for « 
as a complex zero of f(x), such that the complex number f («)=0. 
Again the theorems above still hold. In factorising f(x) into 
(w — «)g (x), it is possible for g(x) to have complex coefficients; indeed 
if «=a+1b (bA0), then g(x) must contain such coefficients. 

What remains to be shown is that we need go no further, i.e. that 
all the roots of f(x)=0 are in the field of complex numbers and that 
there are precisely ἢ of them. This is the fundamental theorem of | 
algebra, concerned (as is the corresponding fundamental theorem of 
arithmetic) with factorisation. Various proofs of the theorem are 
available but none of them is easy; indeed it seems not possible to 
provide a proof purely in algebraic terms. It is remarkable that 
algebra would appear to rest on a non-algebraic basis. 


3.7. The fundamental theorem of algebra. The analogy between the 
integral domain 75] of polynomials f(x) and the integral domain J 
of integers is our guide. We start off, exactly as in 3.1 for integers, 
with the idea of getting the highest common factor (H.C.F.) of two 
polynomials, of getting a division algorithm, and of factorising a 
polynomial into prime (or irreducible) factors. The factors then lead 
to the roots of f(x) =0. 

Take f(x) =2" + dye" -1+...4+0,0+0, a8 a polynomial of degree 
n>0 and with rational coefficients and g(x) as a similar polynomial 
but of degree m>0. The H.C.F. of f(x) and g(a) is the polynomial of 
highest degree which divides both. A division algorithm is developed 


7] POLYNOMIALS | 17. 


to isolate it. By the same argument as in 3.1, take m<n and divide 
g(x) into f(x): | 
f (2) =H (&)g (@) Ἐτι(α) 

where q, (5) is the quotient polynomial] and r, (x) of degree <m is the 
remainder. As long as the remainder is of positive degree, we carry on 
dividing : r,(x) into g(x), then the new remainder into r,(x), and so on. 
Since the successive remainders are of decreasing degree, it must 
happen sooner or later that a remainder of zero degree results. The 
last remainder (before this stage) is the H.C.F. Two examples 
illustrate: 


(i) v24+22+1 and 2-1. 
| " 
First ἘΞ ΞΕ. has quotient 1 and remainder 2 (x + 1); 
at — It H.C.F. 
x? —] =(%+1). 
then rel has quotient (x -- 1) and no remainder 
Or: (ω5- 2a+1)=(a?-1)+2(v+1) and (“3 -- 1) Ξε (ας -- 1)(ς 1). 
(ii) ο φῆ ϑαξ 2 and 23-22 +2x—-2. 
4 2 
First a ee _—_ 5 has quotient (z+ 1) 


and remainder 2 (x? + 2); 


— (72 
Pie = "= as quotient ( -- 1) eee 
x2 +2 4 
and no remainder 
Or: (a4 + 3a? + 2) = (uw + 1) («3 — x? + 2. — 2) + 2 (a? + 2) 
and (“3 — x? + 2 — 2) =(x — 1) (x? + 2). 


Jobbing backward, as in 3.1, the H.C.F. is expressed as: 
(polynomial) f(x) + (polynomial) g (zx) 

proving the result: 

THEOREM: If c(a) 1s the H.C.F. of two polynomials f(x) and g(x) of 
positive degree, then there exist polynomials ¢ (x) and s(x) so that 

φ (a) f (x) + ψ (2) g (©) Ξξξ οὶ (α)....«ὐνννννννννννννννννον (1) 
In example (ii) above: 
xv +322 + 2=(x% +1) (“3 — 472 + 2. -- 2) 42 (4? + 2) 
ie. $(24+322+2)-—$(e4+1) (“3 -- αἢ 2. --δγτεαδ 2 (Η.6 8} 

so that d(x~)=4 and (x)= -- (ἡ -- 1). 


78 POLYNOMIALS [3 


A polynomial p (x) is irreducible in the field R of rationals if p(x) is 
of degree n>0 and if it has no polynomial factors with rational 
coefficients other than 1 and p(x) itself. Exactly as for integers in 
3.1, it follows from (1) that: 


THEOREM: Every polynomial f (x) of positive degree and with rational 
coefficients can be uniquely factorised into a product of orreducible 
factors: 


F(a) =p, (©) rg 1) ... ἧς (H)....ccescecscvocenvececes (2) 


We have followed the argument used for integers; but the result we 
have obtained is not the end of the story as it was for integers. One 
kind of irreducible polynomial is (x -- α), for « rational. Among the 
factors of (2), there may be some of linear type (2 — «). It is, however, 
not necessary that all of the factors, or indeed any of them, should 
be of this form. Quadratics like (z?-—2) and (x?+1) are irreducible 
in the field R of rationals. It is necessary, therefore, to extend the 
range of factors so that their coefficients are from the field of complex 
numbers. Then a polynomial p(x) irreducible in the field of rationals 
may become reducible (into factors) in the field of complex numbers, 
eg. “5--2-- (ας -- -2) (1. Ὁ.ἷ22) and 2?+1=(%-1) (+2). 

At the same time, the limitation that the polynomial f(x) has 
rational coefficients can be relaxed. The fundamental theorem of 
algebra states that, if a polynomial with rational, real or complex 
coefficients is considered over the field of complex numbers, then the 
only irreducible factors are of linear type (ὦ -- α), for « complex. In 
its simplest form: 


FUNDAMENTAL THEOREM OF ALGEBRA: Every polynomial f(x) of 
postive degree and with rational, real or complex coefficients is such that 
f(«)=0 for some complex number «. 


This means that there is a complex root « of f(x) =0 and hence, from 
3.6 above, that (x — α) is a factor of f(x) for some complex «. Notice 
that complex roots automatically include (as special cases) all real 
roots, both rational and irrational. 

The proof of the theorem (apparently) cannot be given in algebraic 
terms. Those available involve topological concepts; one is sketched 
in 15.2. 7 | 

The fundamental theorem can be developed into a form more 


7] POLYNOMIALS 719 


directly usable in practice. The first step is to write the theorem in 
the form: f(x)=0 of degree n(>0) has a complex root «, and so a 
factor (ὦ -- αι), giving f(x) =(% -- αι)σ (α). If n=1, then g(x)=1 and 
the factorisation is complete: f(x)=2-.a,. If n>1, then g(x) is a 
polynomial of degree (n — 1), with rational, real or complex coefficients. 
Hence, g(x) =0 of degree (n -1)>0 has a complex root ας and so a 
factor (x -- ας), giving f(x) ΞΞ (ὦ — αι) (ὦ — «,)h(x). Here h(x) =1 if n=2 
and otherwise h(x) is a polynomial of degree (n — 2)>0. This process 
continues until a residual polynomial is obtained equal to unity; 
this happens after n steps and so: 
Sf (®) = ὦ — a) (% — α) ... (ὦ — α,). 

Hence, f(x) of degree 7 has exactly n factors and f(x) =0 has exactly n 
roots in the field of complex numbers. 

One further step can be taken when /(z) 
has rational or real coefficients. If «=a+1b 
is a complex number, write «*=a—-1b and 
call it the conjugate of «. Then 


a+a* =(a+1b) + (a -- δ) =2a 
and = axa*=(a+1b)(a—1b) =a? +06? 


which are real. On an Argand Diagram (Fig. 
3.7) if « is represented by P, then «* is P*, Fic. 3.7 
the reflection of P in Ox; the sum of « and. 
«* is the point Q on Ox. Since OP and OP* make equal and opposite 
angles with Ox, the product of « and «*(adding the angles) is also a 
point on Oz. | 

In the relation obtained for f(x) with rational or real coefficients: 

f (x) =x" τα, 0-1 +... τ αχγὰ +A =(X — a) (ὦ — ag)...(U — ay) 
replace each number (coefficient) by its conjugate: 

L™ + Any *U" 24, - αὐ +A9* Ξε (ὦ — αι ἢ (a — ay*)...(% -- α, Ὦ). 
Any real coefficient is equal to its conjugate (α Ξε αὖ if b=0). This is so 
for all the coefficients on the left (and for such of the «’s on the right 
which are real). Hence: 


(a — αι) (ὦ — ag)...(% — α,) = (ὦ — αι ἢ) (ὦ — α,Ἔ)...(ὦ -- α, ἢ) 


i.e. the set αἱ, ας» ... «, is simply a shuffling of the set «,*, a,*, ... «,*, 


80 POLYNOMIALS [3 


real values remaining in the same place but complex values being 
shifted. Hence, if complex « is a root of f(x), then so is the conjugate 
a™®, 
The final and practical result obtained is: 
Every polynomial f(x) of degree n>0, and with rational or real 
coefficients, has exactly n linear factors in the field of complex 
numbers: 
f (%) =(% - 01) (% — ag)...(% — op) 
and f(x)=0 has exactly roots a,, a, ... ἄρ. Some or all of the 
roots may be real. Other roots occur in pairs of conjugate com- 
plex numbers. If the degree n is odd, at least one and generally 
an odd number of roots are real. If the degree n is even, there may 
be no real roots and, in general, an even number of real roots. 
As a final point, the factor (v-—«) may appear only once, in which 
case we have a single root. It may, however, appear twice or more 
often, corresponding to a multiple root. Some examples illustrate: 
(i) v3 -- ὅχ -ἘΞ-ἢ (ας -- 3) («3 -- 2κ -- 1) irreducible in the field of 
rationals 
= (x -- )γ(α --ὶ -- 2) (ὦ --᾿ Ὁ 2,2) 
ie. «3 — 3224+ 5=0 has three real roots $, 1+ /2, one rational 
and two irrational. 
(ii) v3 Ἐπ - 10 τ, (ῳῷ -- 2) ("3 -- 2. -- δ) irreducible in the field of real 
numbers 
= (x +2) (ας -- Ἰ -- 2) (ας —-1+22) 
ie. 22+x2+10=0 has one real root —2 and the conjugate 
complex pair (1+ 22). 
(iii) 2° — 24 — 243 + 20? +4 --ἸΔὶ τε (ὦ -- 1) (ὦ -- 1) (ὦ -- 1) (5 - 1)(5 +1) 
-ια- 55:15 
The corresponding equation has all five roots real, one triple 
(1) and one double ( -- 1). 
(iv) 24 — 443 + 1022 -- 12. --9 -Ξ- («3 -- 2. -- 3)3 irreducible for real 
numbers 
| -- (ὦ ~ 1-1/2)? (2 —1+4,/2)2. 
The equation has four complex roots, a double conjugate 
complex pair (1+72,/2). 


8] POLYNOMIALS 81 


3.8. The nth roots of unity. It may seem rather like gilding the lily to 

ask for the nth roots of unity, i.e. the zeros of the polynomial x" -- 1 
or the roots of the equation 2" =1. Surely, it may be said, the root 1 
(the zero x=1) is enough. However, a question emerges immedi- 
ately. By the result just obtained, the polynomial 2" —1 of degree 
n has exactly n zeros. One of them is 1; what are the other (η -- 1)? 
The investigation of the subject has a fascination for pure mathe- 
maticians, and to pursue it far would take us deep into number 
theory. On the other hand, a quick look at the problem is not a 
useless exercise; it provides an illustration of a ‘transformation 
group’ (6.4 below). 

Let us attempt a direct attack. We require n roots w such that 
w"=1. One root is easy: w=1. Take out the factor (x-—1) from 
(a" —1): 

x" —] 
x—1 


art ἐσθ ἘΧΖ- 


which is just a geometric progression (in reverse order) with common 
ratio x and sum of n terms x" — 1/2 -- 1 as shown. Hence the other 
(n — 1) roots come from: 
an-ltgn2y.. +¢4+1=0, 
This looks easy, but isn’t. If n is odd, nothing else is obvious enough 
to try. If n is even (n=2m), then w= -- 1 is a root as well as w=1: 
τ eee) ξεῖν + gma ΣΕ, 
a#4+i 
Or, from the beginning: 
(2? — Ἰὴ (“3 — 1) = a2m—-2 4 g2m—4 | 24] 
1.6. ἃ geometric progression of m terms with common ratio x?. But 
we are back with essentially the same polynomial for the other roots, 
in 2? now: 
᾿ (ἘΠῚ (e2)m—2 + (2) 4-1, 
Again there is nothing obvious to try. 
As an alternative, let us try to sneak up on the problem, by 
working out the solution for small values of n>1: 
n=1: “2-1 One root, w=1 
m=2: x*-1=(%-1) (4 +1) Two roots, w=+1 


8: POLYNOMIALS {8 
m=3: 2—-1l=(x-1)(z*+4%+1) Three roots, w=1, $(-147/3) 
where the pair of conjugate complex roots are from the 


quadratic 
— 1=(x? -- 1) (v?+1) Four roots, a =+1, +7. 


3 
τ 


For n>4, we cannot continue, in purely algebraic terms.* But we 
can make progress in graphical terms with the aid of the Argand 
Diagram for complex numbers, here the nth roots of unity. We get, 
first, a broad hint from the following facts about the roots where 
n=2, 3 and 4. For n=2, the two roots (+1) can be shown: w= — 1, 
w?=1. For n=3, the three roots are: 
=4(-1+4%4,/3), w®=3¢(-14+2,/3)?=4(-1-7/8), w=1. 

For n=4, the four roots are: w=1, w?= —1, w?= --ἢ, wt=1. This 
suggests (no more) that, if w is one of the roots of x* = 1 not previously 
obtained as a lower root of unity, then the 7 roots are: w, w?, w3, 
.. wo”. By definition, w*=1 so that the last root is 1, as required. 
- To indicate (if not to prove strictly) that this isso, we use an Argand 
Diagram. The roots of ?=1 are w= —1 at A’ and w?=1 at A. The 
roots of x?=1 are: 


and w= 1 at A. 


To locate C and D (Fig. 3.8a), note first 
that D is the reflection of C in Ox (being 
conjugate). Let CD cut Ox in N. The tri- 


angle ONC is right-angled, with ON =5 ; 


no=x3 and OC=1. The angle NOC is 


60° (cos 60° =ON/OC Ξε 3); and angle AOC 
is 120°. Hence C is 120° round the unit. 
circle from A, and D is another 120° round 
from C. The three cube roots of unity are: 


* Indeed, it is a general point that polynomial equations of degree 5 and higher are 
of a different nature from those of degree 4 and lower; there seems to be a ‘sound 
barrier’ at n = 5. 


8, 9] POLYNOMIALS , 83 
C(w), D(w*) and A (w’), making up an equilateral triangle inscribed in 
the unit circle. Finally, the roots of z4=1 are: 
w=t at B; ὠϑ-- --Ἰ αὖ A’; w= --ὐ at B’; and wt=1 at A. 
They form a square in the unit circle: B(w), A’ (w*), B’ (w*) and A (w4). 
The generalisation is clear: the nth roots of unity correspond to 


the vertices of a regular n-sided polygon inscribed in the unit circle, 
the last vertex being at A. The polygon is CDEF ... A in Fig. 3.8}, 
where the 2 AOC =~ 
to a complex number w. Then w?=wxw is the point P with 


OP =0C?=1 and . AOP=22 AOC = Ξ 360° (see 2.5). The point P 


360°. To indicate that this is so: let C correspond 


is D. And so on round the unit circle, 

until w*=1 is obtained as the point A. 

Hence, the complex number w, at the 

point C, is one of the nth roots of unity. 

So are all the powers of ὦ up to w". For: 
(w?)* = (w")? = 1? =]... 

The general result is: 


THEOREM: The nth roots of unity, com- 
plex numbers w such that w™=1, are 
shown on an Argand Diagram as the Fie. 3.85 
vertices of a regular n-sided polygon 
CDEF ... A inscribed in the unit circle. They can be expressed: 


w, w*, w3, 6. w"=1 
where ὦ 1s the complex number of the first vertex C with 2 AOC =360° |n. 
For n >4, it is not easy to represent ὦ and its powers algebraically 
in terms of (e.g.) surds such as square roots. It is still true that 
w =a-+ for some real a and 6 and rational approximations to a and 


ὃ are given by trigonometric tables.* However, the general result in 
diagrammatic form is often enough. 


3.9. Exercises 


1, Express 1092 and 330 as products of primes and show that the H.C.F. 
is 6. Check by the process of repeated division (Division Algorithm) and show 
that: 13 x 1092 — 43 x 330 =6. 


* Since a =cos (360°/n) and b =sin (360°/n). 
A.B.M. 


84 POLYNOMIALS : [3 


2. Gaussian integers. The set «=m-+in (m and n integers) satisfies all the 
operational rules of 2.2 except that reciprocals are lacking, i.e. it is an integral 
domain. Check the rules for addition, noting that zero is 0 =0 +2 x 0 and that 

—@=(-m)+i(-—n). Check the first three rules for products, note that unity 
is 1=1+7x0, but that this does not provide a reciprocal «-! of «. Check the 
distributive rule. Finally show that there are no zero divisors by writing 
a=m-+in and B=p +ig and showing that «8 =0 implies either 

m=n=0(«=0) or p=q=0 (B=0) 
or both. (Equate real and imaginary parts.) 

3. Show that 13 =(2 +32)(2 —3¢) =(3 + 2¢)(3 —2¢) and two other pairs of 
factors obtained by multiplying those written by -—1. Deduce that 13 is not 
prime as a Gaussian integer, being uniquely factored: 13 =(2 + 8) (2 -- 81), 
allowing for the four units +1, +2. 

4, Show that 1 -- 32 =(1 —7)(2 —7) and 24 =(1 +2)? are unique factors. 

5. Polynomials with integral coefficients. Illustrate the limitations of a 
polynomial with coefficients confined to integers by reference to 


Qn? --οα —-3 =2(22? 40-3) and α --ἰ -$=4}(2a7 --οᾶα -- 3). 


Generalise to show (i) that a polynomial with integral coefficients cannot 
generally be written with leading coefficient unity, but (ii) that a polynomial 
with rational coefficients can always be written (apart from a rational factor) 
as a polynomial with integral coefficients. 

6. From (2) of 3.3 for products of polynomials, show that: 


(0, 1, 0) x (0, 1, 0) =(0, 0, 1, 0,0); (0,1, 0) x (0, 0, 1) =(0, 0, 0, 1, 0); 
(0, 0, 1) x (0, 0, 1) =(0, 0, 0, 0, 1) 


and interpret as x xx=2?, ὦ καδτεαδ, x? x x* =a, 
7. Scalar Multiplication. From (2) of 3.3 show that 


(k, 0, 0) (a, ὃ, 6) =(ka, kb, ke). 


Interpret as k(a + bx +cx*) = (ka) + (kb)x + (κογ and examine the case k =" 


The polynomial (4, 0, 0) =k, a rational scalar. This result is ‘scalar multiplica- 
tion’ of polynomials, a process of very wide application. 


1+? 2 1 --αϑ d 1+2° 2 


8. Show: γ πὸ = +1 ye? T-x 1a 


——— ..]: —— = = 2 4 
}-2? 1-2? 1; To (1 +a +2"). 


Taking each rational fraction as a function of x, write an appropriate domain 
of x in each case. Is the domain different in any of the reduced forms above? 

9. Plot the graph of y =(#* -- Ἰ)[(Σ -- 1) defined on the domain of all real 
α(α #1). Show that it is identical with the graph of the quadratic “3 ἘΠ ἘἸ 
except that the point where x=1 is missing. Deduce that the range of y is 


y>é (y #3). 
32. 2 2 
10. Show that the graph of y = =1+ πε + is of form indicated in 


9] POLYNOMIALS 85 


Fig. 3.9. Why is y not defined at x = +1? With the idea of a limit (Chapter 9), 
we say that y—>1 as x +00, giving an ‘asymptote’ y =1. 


-.-......ΞΞ-Ξ---.- 
Ἵ 


-3 -} - 


om oe ee τ oe ἀνα αὐ ὅδ» ὅπ ew ow oh 


ἘΠῚ, Polynomials over the field of integers (mod n). Generalise the results of 
2.9 Ex. 22 and 23 to show that polynomials can be defined over the integers 
(mod n) but that they are only well-behaved if n is prime and the integers 
(mod n) form a field. What goes wrong if n is not prime? Consider the field of 
integers (mod 2) and show that there are then only four quadratics: x?, x? + 1, 
αὐτὰ and 2*+2+4+1. Noting that (~+1)?=2?+1 (mod 2), show that there is 
only one irreducible quadratic among the four. 

12. Polynomial function of a complex variable. Consider f(z) =z?-1 as a 
function of 2 τεῦ τὖν. Write f(z) as the complex value X +7Y and show that 
X =a? —y?—land Y =2zy (by equating real and imaginary parts). If z* -- 1 —0, 
show that X =0, Y =0 together, i.e. x= +1, y=0, 1.6. z= +1. Deduce that a 
polynomial equation in a complex variable’ can have real roots. Examine 
f(z) =z? +1 similarly and show that z* + 1 =0 has roots z = +4. 

13. Multiple roots. Show that f(x) =0, a polynomial equation with rational 
coefficients, has a double rational root α if and only if f(x) =(x — «)*g(x) where 
g(x) is a polynomial with rational coefficients such that g(a) 40. Extend to an 
r-fold root «. How is the result affected if a is real or complex? Establish the 
following and find the other roots (if any) in each case. 


Polynomial Multiple roots Polynomial Multiple roots 
x — 3a7+4 2 twice x4 4+ 2. --2ὃ. -1 -- 1 three times 


xt -- 442 +4 /2 twice x* — 425 +. 823 -- 82 +4 1 +2 twice 
—/2 twice 1 —2 twice 


86 POLYNOMIALS [3 


14, Show that x-1 is the H.C.F, of #4-a2°+x”-1 and 2*-a%+x-1 by 
dividing out: 
αὐ —~x? +e --Ἰτεα(υϑ ποδία —1) -- (α -27 +1) 
xv —72? +2 --ῬἸ τ (ὦ -- 1) (ὦ -- 2. --1) Ἐ2(5 -1) 
«5 -- 2. -Ἐ] τ ας -- 1)(α -- 1) (with no remainder). 
Hence show: φ(σ)(α" -- αὐ τὰ -- 1) -Ἐ ψ(5) (x? -2?+2-1)=2-] 
where d(x)=$(x+1) and (x)= -- («5 - ὦ -- 1). 

15. Show that the following factors are irreducible in the field of rationals: 
α'--αϑτα --Ἰ τΞ(α -- 1)(ας -- 1) (a? -17 +1); e -—2?2? τὰ --Ἰ τείας -1)(u? +1). 
Write the factors of the quadratics in the field of complex numbers and hence 

find all zeros of each of the original polynomials. 
16. Ifa" τα, απ ΠῚ +a,_.4"-* +... +4,% +@) =0 is any polynomial equation, 
establish the following criteria as practical guides: 
(i) x =0 is a root if a, =0 


(ii) x =1 is a root if the coefficients sum to zero: 1+a,_,+a@,.+...=0 
(iii) «= — 1 is a root if the coefficients, alternating in sign, sum to zero: 
] - α,, --Ὁ +Ay-2 - Ans +... =(Q. 


Examine the polynomials of Ex. 13 and 15 above in the light of these criteria. 

17. Show that a polynomial of degree 2n containing only even powers 
(22, x4, ... 23") has zeros to be found from a polynomial of degree n in x’. 
Illustrate by showing that x* —4x7?+4 has zeros x? =2 (twice), ie. Y= +:/2 
(each twice). What can be said of a polynomial of degree 2n + 1 containing no 
constant term and only odd powers (2, δ, ... ~?"+1)? Show 558 — 2.8 - 2. =0 
has roots 0, +/(1 +7). 

*18. Residue classes. Divide the set J of all integers into five (non-over- 
lapping and exhaustive) subsets J, (7 =0, 1, 2, 3, 4) according to the remainder 
r on. division of an integer by 5. J, is {... —10, —5, 0, 5, 10, ...}; write the 
other four. Show that the J,’s can be represented sufficiently by the corre- 
sponding r’s and that the set of integers {0, 1, 2, 3, 4} (mod 5) results. The J,’s 
are called residue classes of J. Carry out the same process in dividing the set 
F [x] of all polynomials over the field of rationals into residue classes according 
to the remainder on division by x? — 2. Show that each residue class corresponds 
to a linear polynomial ax +b, for various rational a and ὃ. 

*19. Field of polynomials (mod «* — 2). Continuing from Ex. 18, show that, if 
ax +6 is the remainder on dividing f(x) by “3 -- 2, then ax +b is obtained by 
substituting v?=2, 2*=2z, x4=4, w=4z,... in f(z), a Remainder Theorem 
similar to that of 3.6. Hence show that the remainders obey all the operational 
rules for + and x, including reciprocals and division. In particular: | 
ax+b (bc —ad)x + (2ac — bd) 
cota ὁ 2c? —d? ° 
(For the second, multiply numerator and denominator by cx —d; for both, put 
x? =2.) Hence the set of remainders is a field, the field of polynomials (mod 


(ax +b) (cx +d) =(ad +bc)x + (2ac + δα); 


9] POLYNOMIALS 87 


ἢ -- 3). The elements follow all elementary algebraic processes, provided 2? is 
written 2. 

*20. Other polynomial fields. As in Ex. 19, obtain the field of polynomials 
(mod “3 - 1) as the set of linear polynomials az +6 with x?= —1, i.e. with z 
interpreted as ὁ. Indicate the generalisation: if F[x] is the integral domain of 
polynomials f(x) over the field F', then taking remainders on dividing f(z) by 
a specified polynomial g(x) gives a field — the field of polynomials {mod g(zx)} 
—— provided that g(x) is irreducible in F’. 


CHAPTER 4 


SETS 


4.1. The basic concept of a set. The idea of a set is at the basis of all 
mathematics. As a description of the idea it is enough to say: 


A set is a collection of well-defined objects thought of as a whole. 


The qualification ‘well-defined’ is essential; the objects must be 
precisely specified. The specification may be by listing all the objects, 
or it may be the provision of a property or formula which describes 
some common characteristic of the objects. As a result we can adopt 
convenient notations for a set, writing a capital letter for the set 
itself and small letters for the objects of the set: 


A={a, ὃ, ο, ...}={a|ais P}. 


The first is particularly appropriate when the objects are listed, the 
second when they are characterised by some formula or property P. 
In the second notation, the vertical line | is to be read ‘such that’; A 
is the set of all objects a such that ‘a is P’, describing the common 
property. | 

The ‘objects’ which are the members or elements of the set may be 
entities of any kind. The basic concept of a set includes the idea that 
a set is composed of members and that members belong to the set. 
To denote that the object a is ‘a member of’ or ‘belongs to’ the set A, 
we write ae A. Two examples illustrate. The ten digits used in the 
decimal system of writing rational numbers make up a set A; the 
members of A are digits, particular marks on the paper. A can be 
listed, or it can be described by a property: 


A ={0, 1, 2, 3, 4, 5, 6, 7, 8, 9} ={a | a is a digit}. 
The boys assembled in the Lower Third of a named school at a 
specified time compose a set B. The ‘objects’ are now boys; they are 


described by the property given and they can be listed (e.g. in the 
class register if it is well-kept). So Brown of the Lower Third belongs 


1 SETS 89 


to B if he was present at the time, whereas Green, an absent Lower 
-'Thirder, does not. We can write Brown ε B and Green ¢ δ. 

Finally, the words ‘thought of as a whole’ indicate that the concept 
of a set is somewhat sophisticated. The objects may be easily com- 
prehended but a collection of objects is a further and abstract idea; 
the totality is more than the sum of the parts. A boy is an ‘object’ 
easily recognised; the set of boys making up the Lower Third is a 
little more difficult. Or, to take another set of persons, even if we 
know what father, mother, aunts and uncles are, a set of them 
requires the idea of members of the same generation in a 
family. 

Having said all this, we still have the task of postulating the 
properties we wish a set to possess. We might try to take the descrip- 
tion of a set given above as a definition and then to deduce properties. 
This would not take us very far. It seems better, in the end, to take 
‘set’ as a primitive (undefined) concept and ‘member of’ as a primitive 
(undefined) relation. We are then free to define other concepts, such 
as ‘subset’, in terms of them, and to lay down as axioms the properties 
we find we need. Set theory, though basic in mathematics, is rela- 
tively new, stemming from the work of Cantor (1845-1918). An 
axiomatic formulation is an even more recent development and the 
one followed here is essentially that of Zermelo (1871-1956). The set 
of Zermelo’s axioms, suitably adjusted to the present approach, is 
shown formally in 15.3, but what the axioms attempt to do is quite 
easily described in general terms. A first axiom asserts that a set is 
completely fixed by specifying its members, so making possible the 
notation A ={a, b,c, ...}. Two axioms follow to permit sets to be 
built up from smaller ones and to be obtained by breaking down 
larger sets into subsets. Another pair of axioms is required to allow 
sums and products of sets. For two sets, summation yields their 
‘union’ as the set of members belonging to one set or the other. 
One product is the ‘intersection’ of two sets, the set of members 
belonging to one set and the other. A second kind of multiplication 
gives the ‘Cartesian product’, the set got by selecting pairs of 
members, one from each set.* 

Even these five axioms are not enough. Many sets are finite, 


* The intersection is sometimes called the ‘inner product’ and the Cartesian product 
the ‘outer product’ of the two sets. 


90 SETS [4 


having members, where 7 is a positive integer. Those quoted above 
are instances. But more usually we handle ‘infinite’ sets, e.g. the set 
of all integers or of all rationals. The five axioms are then found to be 
insufficient. Indeed, the step from the ‘finite’ to the ‘infinite’ is a 
tremendous one, and one of the utmost importance. The step is from 
the set of natural numbers {1, 2, 3, ... n} to the set of all natural num- 
bers {1, 2, 3,...n,...}. In adding ... to n, we are making a major 
conceptual advance into completely new and open country. It will 
occupy our attention later in this chapter. 

Many sets, such as those of Chapter 2, have numbers as elements. 
A different kind of set is met in Chapter 3, a set of polynomials each 
member of which is itself a set (ordered sequence) of coefficients. 
Here is a case of a set of sets. A further instance is a set of ‘matrices’, 
where a ‘matrix’ is an ordered arrangement of entities (e.g. numbers) 
in rows and columns (Chapter 13). Still another kind is a set of 
operations and particularly of transformations (Chapters 6 and 7). 
There is no end to the different sorts of sets which can be specified. 
The examples below provide instances of sets of people, or other 
everyday entities. 

A further notation is needed to indicate cases where one set is 
contained within another, a concept roughly corresponding to ‘less 
than’ for ordered numbers or magnitudes. Write Ac B for A as a 
subset of B if each element of A belongs also to B. This includes 
A=, A identical with B, as the particular case where A and B 
contain precisely the same elements. Hence it is possible to write 
both Ac B and BCA, simply meaning A=B. Write ACB for A 
as a proper subset of B where Ac B but A+B. Some examples 
illustrate : 

(i) J={...-—2, -1, 0, 1, 2, ...}={n [πὶ an integer} as the set of all 
integers. Then J+ = {1, 2, 3, ...}={n | 2 a positive integer} is a proper 
subset of J. 

(ii) Six people sit down to dinner: the host and hostess and two 

matried couples, X and X’, Y and Y’ re- 


(ἢ (a) spectively. The table is a rectangle and the 

Hostess | | Hi ' host and hostess sit one at each end. The 
_ other four places are labelled (a), (6), (c), (d). 

() (ὦ) The set S of all arrangements of the seating 


Fia. 4.1α consists of 24 elements, listed on p. 91. 


1] SETS 91 


A proper subset S, of S consists of all arrangements in which each 
married couple is separated, i.e. X and X’ do not sit together, neither 
do Y and Y’. There are 16 elements in S,, as seen from the listing 
below. There are four arrangements (numbered 7, 9, 20 and 23) 
where the sexes alternate round the table: host, female, male, hostess, 
male, female. This is also a proper subset S, of S. The two subsets S, 
and δ overlap, having two elements in common (numbered 9 and 
20). This subset S, of two elements comprises the alternative arrange- 
ments which the hostess probably has in mind: separating the sexes 
and the married couples. S, is a proper subset both of S, and of 8... 


Inst Seats List Seats List Seats 

mo. (a) (δ) (c) (d)| no. (a) (δ) (ο) no. (a) (δ) (6) (d) 
| xX xX’ ¥Y γ' 9 XxX’ Y xX ὙὟ' 1? Y Y x xX 
2 X xX’ yY’ vy 10 X’ Y Y’ xX 18 Y Y’ X’ xX 
3 4 Y X’ y’ 1 Χ’' γΥ' X.Y 19 Y’ Χ X’ Y 
4 ΧΟΥ͂ Υγ' xe 10. Χ' ye IT Χ 90: γΥ'΄ Χ YY Χ' 
ὅ x xX" 13>. Υ Χ Χ’ ove 21] Ya oe Ὑ 
6 X YY’ Y xX’ 4. Υ Χ YX 22 ΥγΥ' OX Υ͂ Χ 
1 XxX’ x Υ͂ γ' 1  ΥΥ xX’ x yY’| 23 Υ Υ xX’ 
8 xX’ X Y’ Y 16 <¥ = xX” Y* xX 24 Υ Y X’ X 


(iii) In a tribe, all members belong to one class (upper class) or 
another (lower class). Marriage is permitted only between men and 
women of the same class; sons take the same class as their parents, 
daughters the other class. (The 
idea here may be the prevention a F, M,=F 
of in-breeding.) The set S of 16 ~~ 4S______ 
people, shown in a family tree, ! | 
consists of three generations, the 
grandparents being an upper class 
couple and a lower class couple 
respectively. One proper subset S, jie ones ἜΝ 

M: Male F: Female =: Married 
consists of the three unmarried Subscripts 1: Upper Class 2: Lower Class 
men; another S, of the five un- Fie. 4.1 
married women. The marriages to 
be arranged pair off an element of S, with a suitable element of 8, (M, 
with F,, M, with F,). The marriage laws are such that M, can marry 
his mother’s sister or his mother’s brother’s daughter; he cannot 

D2 A.B.M. 


FM, ἢ F, 


92 SETS [4 


marry his father’s sister (or his father’s brother’s daughter if he has 
one). 

4.2. Operations on sets. Consider a given totality of elements and form 
all possible sets A, B, C, ... from the elements. There are two special 
(limiting) sets: the wniversal set U of all elements and the empty set 
¢ of no elements. For completeness, these are included with other 
sets of which they form the bounds: 


@cAcU for any set A. 


Three operations are defined. One is a unary operation, involving a 
single set A, and gives the complement A’ of A as the set of all 
elements not in A. The other two are binary operations, involving a 
pair of sets A and B. One gives the union AUB as the set of elements 
which are in A, in B or in both, the other the intersection ANB as 
the set of elements which are in both A and B, 


DEFINITION: The complement A’ of A is the set of those and only 
those elements which are not in A. The union AUB of A and Bis the 
set of those and only those elements which are in A or in B (or both). 

The intersection AB of A and B is the set 
of those and only those elements which are in 
mec 
Al 


A and in B. 
The symbols ὦ and (ἡ may be read ‘cup’ and 
‘cap’ respectively. 

These operations on sets can be illustrated 
by drawing what are termed Venn Diagrams. 
The universal set U is represented by points 
within a rectangle, and sets A, B, ... are 
shown by points within circles drawn inside 
the rectangle. Fig. 4.2a illustrates the results 
of the operations of complement, union and 
3 intersection (shaded area in each case). Too 
A B much stress should not be laid on these dia- 

grams; they are no more than helpful illus- 
trations. 
Set theory deals with these operations on 
sets and with certain relations which can be 
ANB defined in terms of them. One useful relation 
Fig. 4.2a is ANB=4, i.e. the intersection of the sets 


2, 3] SETS 93 


A and Bis empty. In this case, A and B have 
no elements in common; they are disjoint sets. 
The relation of inclusion Ac B, already in- 
troduced, can now be expressed: ANB’ -- ῴ, 
i.e. A and the complement of B are disjoint, 
as in Fig. 4.26 where A and B’ are shaded. 
For example, if R is the set of all rational 
numbers, then the definition of a real number 
(2.4) divides R into two disjoint and exhaus- 
tive sets LD and 6: LUG=R and LONG=¢. MEG ea a? 

It is evident that the operations (’, U, Ὁ) are closely linked with 
the verbal ideas of ‘not’, ‘or’, ‘and’ respectively. These are the words 
in bold in the definition above. This will be followed up in Chapter 5. 


1 


If AGB, then ANB'= 


4.3. The operational rules for sets. Meanwhile, another line of thought 
is pursued. The two binary operations (Ὁ, 9) are similar in many 
respects to the ordinary arithmetic concepts of sums (+) and 
products ( x ). This is particularly so for sums and products of proper 
fractions: $+4=% getting ‘bigger’ like AUB and 3x %= % getting 
‘smaller’ like ANB. On this line of approach, the two bounds (4, U) 
are similar to the numbers 0 and 1 respectively. The similarity is by 
no means complete and there are many divergencies, But it is striking 
enough to pursue. The suggestion is that U can be re-written as +, 
Aas x,¢as 0 and U as 1. 

A question now presents itself in set theory: what rules are obeyed 
by the operations of complement, union and intersection? The rules 
are got directly from the definitions as set out formally, and in terms 
of the notation ‘, ὦ and n, in 15.3. They are to be used in this form 
in most of set theory, as applied for example in Chapter 5. Here, we 
try out the rules with the proposed. more familiar notation: (+, x, 
0, 1) instead of (U, N, φ, U). The notation for complement is re- 
tained. The translation of the rules is given on p. 94. 

The last three rules (8, 9 and 10) relate to complements and they are 
reasonable enough. The test of the notation of + for union and x 
for intersection is the comparison of the first seven rules with the 
corresponding operational rules of ordinary algebra (2.2). It is seen 
that the proposed notation passes the test fairly well but not com- 


94 SETS | [4 


pletely. The seven rules above all look familiar, with the exception 
of rule 4 (both parts), rule 6(a) and rule 7(b). The exceptions are not 
minor ones; indeed, they are strange and might even appear silly. 


The Operational Rules for Sets 
Sets A, B, C, ... with 0 as the empty and 1 as the universal set 


Rule Union (+) Intersection ( x ) | 
Closure (a) A+B is (Ὁ) Ax Bisa se 


1. 
2. Associative ia} A +B+0)= = +B)+C (ὃ) A x(Bx Oe =A x B)xC 
3. Commutative a)A+B= (ὃ) 4x B=B 
4. Idempotent (a) A εὖ "νὰν τὴ} (Ὁ) Ax A= 
6,, (a) A+1= (b) A x0=0 
ς WUFLDUTIVe a x x x x = x 
7. Distributi (a) A (BEC) = AxBt+AxC (6) A+(B x C)=(44+B) x (4 -- ΟῚ 
8. (a) A+A’=1 (b) A 
δ᾽ } Complements { (a) (A+B) =A’ x B’ (b) (A x Ay Ow 4B 
10. Involution (4) = 


The algebra of sets, therefore, is not the same as the algebra of 
ordinary numbers. It is an example of what is known as Boolean 
Algebra, after Boole (1815-64). We have a choice here on the notation 
to adopt and it is a choice which appears elsewhere (e.g. in matrix 
algebra). We can (as in 15.3) make use of new and strange symbols, 
particularly U for union and ἡ for intersection of sets. In this way, 
we separate Boolean Algebra completely from ordinary algebra and 
we choose to ignore the similarities between the two. On the other 
hand (as here), we can retain the symbols we are used to, writing + 
for union and x for intersection of sets. In this case, we depend on the 
similarities between Boolean and ordinary algebra. At the same time, 
we have to remember that not all the familiar rules of + and x are 
obeyed when these symbols are applied to sets. 

It is not easy to make the choice. Generally speaking, however, we 
find much advantage in sticking to the familiar notation. We 
economise in symbols and, for most of the time, we are in well-known 
territory. But we have to clear our minds of the idea that all the rules 
of ordinary algebra have to be obeyed. It is useful to consolidate this 
position here and now. Later (in matrix algebra) we will find that 
the same problem arises; the established notation does make use of 

+ and x, despite the fact that the operational rules are not all 
valid. 


4.4. Boolean Algebra. If the algebraic operations with certain 
entities satisfy the set of rules set out in 4.8, for appropriate defini- 


4] | SETS 95 


tions of complement, union and intersection, then the algebra is 
Boolean. It is the algebra obeyed by sets, as developed above. It is, 
moreover, an algebra with a considerable range of application, as in 
Chapter 5 where the complement, union and intersection of sets are 
related to verbal statements in ‘not’, ‘or’, and ‘and’. This is, however, 
not the only Boolean Algebra which can be devised, as a simple 
example demonstrates. As in 2.8, consider a set of two elements 
{0, 1} subject to the operations: 


Complements: 0’=1 and 1’=0 


Addition: [710 1 Multiplication: [10 1 


-----.-.. 


06|0 1 0 
1/1 1 1 


All the operational rules of 4.3 are then satisfied (see 4.9 Ex. 11). 

A more detailed examination of the operational rules of Boolean 
Algebra is now made, to see to what extent they are in line with those 
of ordinary algebra, and in what respects they differ. We accept as 
familiar and/or very reasonable all the rules of 4.3, with three ex- 
ceptions. Those which are ‘different’ in one way or another are rules 
4 and 6(a) on the one hand, and the distributive rule 7 on the other. 

When extended (as it obviously can be) to several terms, rule 4 
gives: 

A+A+A+...(nterms)=A and AxAxA~x...(n terms) =A. 


On the ordinary rules of algebra, we expect nA and A” respectively. 
The difference here is that the Boolean rule is the simpler. Similarly, 
rule 6(a) is strange: A +1=1, i.e. ‘adding’ anything to the universal 
set 1 leaves it unchanged. This is, however, exactly parallel to the 
rule 6(b): A x0=0, ie. ‘multiplying’ the empty set by anything 
leaves it unchanged. We are used to the second, a property of zero; 
but we are not used to the parallel property applied to unity. The 
difference here is that the Boolean rule is the more symmetrical. 

The feature of symmetry in Boolean Algebra is even more re- 
markable in rule 7, the distributive rule. One part is familiar and 
acceptable, the distribution of x over+:A4x(B+C)=AxB+AxC. 
The other part is strange, the distribution of + over x: 


A+(BxC)=(A+B)x(A+C). 


96 SETS [4 


For any usual system of numbers, this second part of the rule is just 
not true, e.g. 2+(3 x 4)4(2+3)x(2+4). For sets, and Boolean 
Algebra generally, both parts are true and the one is exactly parallel 
to the other, obtained by interchanging the operations (+ and x ). 
Hence the leading question: which should we naturally expect, one 
distributive rule or two? 

Pausing to take stock, we observe that the Boolean rules are indeed 
completely symmetrical. They have the property of duality: if the 
operations + and x are interchanged, and if (at the same time) 0 
and 1 are interchanged, then any one of the rules is transformed into 
its pair. So, for example, 4+0=A changes into Ax1=A, and 
A+1=1 into A x0=0. It is in this way that the one distributive 
rule transforms into the other. Surely, this symmetry or duality is a 
very desirable feature of a system. Boolean Algebra has it; ordinary 
algebra does not. It can only be concluded that, in ordinary algebra, 
we do not realise what we are missing. Without knowing it, we have 
put up with an unsymmetrical system ; and we have more complicated 
rules of repeated addition and multiplication than we need: 


A+A+A+...=nA; AxAxAx...=A". 
Boolean Algebra is neater: more symmetrical and simpler 
A+A+A+...=A;AxXxAXAX...=A. 


However, there is something to be put in the scales on the other 
side. Boolean Algebra, like ordinary algebra, has identities 0 and 1. 
The rules of Boolean Algebra, unlike those of ordinary algebra, say 
nothing about inverses (negatives and reciprocals), or even about 
cancellation. This need cause no concern as far as negatives and 
hence differences go. Provided only that Bc A, we can write the 
difference A —B between the sets A and B. 
First, denote B’=1-—- Β, i.e. B’ is also the 
difference between the universal set and B. 
Then, note: 


AB’=A(1-B)=A-AB=A-B (BcA) 


all represent the same thing, the set consisting 
of those and only those elements of A which 
are not in δ. This is illustrated in the Venn 
Fig. 4.4 Diagram of Fig. 4.4. 


4, 5] SETS 97 


Hence, the lack in Boolean Algebra is effectively reduced to the 
following: there are no reciprocals (and hence no division) and 
cancellation is not valid. Given a set A, there is no set B such that 
A x B=1.* Moreover, it is not true that: if A x B=0, then either 
A=0 or B=0. It is true that, if either A or B is 0 (empty), then 
A x B must also be 0 (empty). It is the converse which fails. Indeed: 

Ax B=0 implies A and B disjoint. 
In Boolean Algebra, there are no reciprocals; even worse, there are 
divisors of zero, i.e. any disjoint sets have a product (intersection) 
which is zero (empty). 

While this is a definite lack in Boolean Algebra, it is certainly not 
the only system with the defect. There are even systems of numbers 
equally defective, e.g. the rather sophisticated algebra of integers — 
(mod 7) where ἡ, is not prime (see 2.7). So, in the algebra of {0, 1, 2, 3} 
(mod 4), we have 2x2=0. It all depends whether we are happy 
without reciprocals and a cancellation rule; if so, Boolean Algebra 
has much to recommend it. | 


4.5. Counting sets. Having established the rules for operating with 
sets, we can turn to another promising question: how do we count 
how many elements a set has? This is indeed a basic, if not primitive, 
idea. We find, however, that we are soon on unfamiliar ground. 
The concept of counting leads on to that of the infinite, and here we 
need to be both cautious and precise. 

It can be agreed at the outset that counting has its ordinal aspect, 
i.e. ticking off in sequence 1, 2, 3, ..., and that this aspect has a very 
close link with the ordered set J+ of positive integers. This is, how- 
ever, not the basic property of counting. To count a set in this ordinal 
way, we need to be able to arrange the set (or at least part of it) in 
sequence; we do not know whether this can be done at all (see the 
‘axiom of choice’, 4.8 below) or, if it can be done, whether it can be 
done uniquely. A much more fundamental view of counting is from 
its cardinal aspect. A set is simply a collection of elements, no idea of 
order being involved. We want to count it. 

The essential idea, from which counting derives, is that of one-one 

correspondence, a most far-reaching concept in mathematics. In 


* Except (as always) that 1 is the reciprocal of itself: 1 x 1 =1. 


98 SETS [4 


everyday terms, one-one correspondence is simply ‘matching’. When 
we say that two sets have the same number of elements, the same 
count, we mean that the elements of one set can be matched off 
against those of the other. For example, a set of three eggs in a 
basket can be matched against a set of three sticks on the ground: 
each egg paired off with one stick and conversely. The pairing may 
also be done indirectly, e.g. if the basket of eggs is in one place and 
the sticks in another. The eggs can be matched off against a set of 
three fingers and then the sticks matched off against the same set of 
fingers. All these sets have the same number of elements; the fact, 
that we say that the number is 3, is incidental at this stage. So: 


DEFINITION: T'wo sets A and B can be put into one-one correspon- 

dence written A~ B, if each element of A can be associated with just one 
element of B and conversely. A and B are then equi-numerous and have 
the same cardinal number. 
There is nothing in this definition to imply that the sets A and B are 
finite. The illustration of three eggs and three sticks is an instance of 
a finite (cardinal) number. The concept of one-one correspondence, 
and of cardinal numbers, applies to infinite sets just as well, as illus- 
trated by the following example. 

The set of even positive integers may be thought to be half as 
numerous as the set of all positive integers. For a finite set of each, 
this may be so, roughly. For example, {2, 4, 6, 8}, the even integers 
less than 10, has 4 elements; {1, 2, 3, ... 9}, the integers less than 10, 
has 9 elements. Any matching of these sets leaves some left over in 
the second and ‘larger’ set. All such ‘end effects’, left-overs in the 
matching process, disappear when all even integers and all integers 
are matched. There is a simple one-one correspondence: to each 
integer n match the even integer 27 and conversely. The sets of even 
integers and of all integers are equi-numerous and have the same 
(infinite) cardinal number. 

On this definition, each and every set has its cardinal number. If 
there are no sets to be matched with it, the cardinal number is 
unique. But otherwise (and generally) all sets which can be put in 
one-one correspondence (A~B) have the same cardinal number. 
Note that counting and cardinal numbers are confined (at least at 
this stage) to sets. We must not slip into the habit of saying that 
(for example) 3 lbs. of sugar or 3 feet of rope are equi-numerous 


5] SETS 99 


with 3 eggs or 3 sticks. The sugar and the rope are not sets and the 
number 3 attached represents a more developed idea (measurement) 
than that of counting. 

The equi-numerous relation A~ B is a first case of an “equivalence 
relation’, met later in a more general context (Chapter 7). As such, 
the following properties of equi-numerous sets A, B, C,... are of 
interest: 


(1) Reflexive: A~A, (2) Symmetric: If A~ B, then B~ A, 
(3) Transitive: If A~ B and B~C, then A~C. 


_ The arithmetic of cardinal numbers, a, ὦ, ..., can then be constructed 
in terms of the equi-numerous sets to which each corresponds. Only 
two simple pieces of the arithmetic need be exhibited here. 

Let a be the cardinal number of the set A (and all equi-numerous 
sets) and let ὁ be the cardinal number of the set B (and all equi- 
numerous sets). Define the relation a<b (a less than or equal to δ) as 
holding if: A~ subset of B. But: a=b if and only if A~B. It does 
not follow, however, that if A~ proper subset of B, then a <b. It is 
still possible that this proper subset of B is equi-numerous with B 
itself and, hence, that A~ B and a=b. Ample illustration can be given 
of cases where a set can be put into one-one correspondence with a 
proper part of itself. It is enough here to note the example given 
above; the set of even positive integers is a proper subset of the set of 
all positive integers, and the two sets are equi-numerous. Hence, as 
long as we can put a set A into one-one correspondence with a 
subset of B (proper or not), we can write a<6 for the corresponding 
cardinal numbers. It is only possible to write a <b, if a<b and if we 
can establish otherwise that a=b is not true. 

Next, define the sum (a +b) of two cardinal numbers as the cardinal 
number of the set 4+ B formed as the union of two disjoint sets A 
and B with cardinal numbers a and 6 respectively. This is straight- 
forward; the need for A and B to be disjoint if the numbers of 
elements are to be added is clear enough. What does need proof, 
however, is that the definition of (a +b) is independent of the choice 
of A and B from all the possible disjoint sets with the cardinal 
numbers a and ὃ. The proof, fortunately, is easy. Suppose A, and B, . 
are another pair of disjoint sets with cardinal numbers a and ὃ 
respectively. Then A~ A, and B~ B,. Combining these two one-one 


100 SETS [4 


correspondences, we deduce that (4+ B)~(A,+B,). Hence the 
cardinal number (ὦ - δ) is uniquely defined, from a and b. 

It is possible to develop all the properties and operational rules of 
cardinal numbers from the definition in this way, i.e. without any 
reference whatever to the ordered set of positive integers. However, 
sooner or later, we wish to associate cardinal numbers and integral 
numbers, the one defined here in relation to equi-numerous sets, the 
other developed on the lines of 2.6. The sets, of three eggs and of 
three sticks, have the same cardinal number ‘three’; we cannot hide 
the fact that this is equivalent in some way to the positive integer 3. 
Consequently, instead of proceeding further with cardinal numbers 
in the abstract, we cut a corner and use the ordered sequence of 
positive integers J+ as the yardstick for counting. We lose some fine 
distinctions but we get to usable results more quickly. We must 
remember, however, that the essence of counting is the one-one 
correspondence; ordering is subsidiary. 


4.6. Finite sets. The order of the set J+ of positive integers is such 
that m <n (or n>m) means that m is before 7 in the order and that 
the difference (n —m) is defined as a positive integer of J+. So ‘less 
than’ or ‘greater than’ is simply a property of order, of a positive 
difference. Denote a segment or section S,, of J+ as the subset — 


{1, 2, 3, ... n}={p | p<n}, 


1.6. all positive integers before n (including 7 itself) in the order of 
J+. Then a set A which is equi-numerous with S,, for some nee 
n can be called finite: 


DerFrniTion: A set A is finite of A~S, for some section S,, of J+. 
We can then write A = {a,, ας, a3, ... d,}. This follows from the fact 
that the elements of A can be associated (in one-one correspondence) 
with the integers 1, 2, 3,...”, 1.6. they can have these integers 
attached as subscripts. 

It is tempting to say, right away, that A has n elements, i.e. that 
A’s count or cardinal number is n, as given by the section S,, with 
which A is linked. The temptation must be resisted. It has not yet 
been shown that the linking of A with 7 is unique; only that it exists 
for some n. We need the important result: 


6] SETS 101 


THEOREM: If a finite set A is a proper subset of another set B, then 
4.. Β is impossible; A and B cannot be equi-numerous.. 


To prove, suppose that A~B and that A ={a,, ag, ... a,}. Hence, 
B consists of the same elements and (at least) one other a,,, (since 
AcB). We have to show that this is impossible. The proof is by 
mathematical induction (see 2.6), ie. we show that it is impossible 
for n=1 and then, if taken as impossible for (n -- 1), it is also im- 
possible for n.* The first part is easy: If n=1, A is a, alone and B 
has (at least) a, and a, as elements. No one-one correspondence for 
A~B is possible. For the second part, take A4~B as impossible 
for (η -- 1). For n, assume A~B with A={a,a,...a,} and B as 
the same elements plus a, , at least. Since A~ B, there is a one-one 
correspondence between A and B, which can always be arranged 
(if necessary by switching two elements in A) so that a, in A corre- 
sponds to a,,, in B. Remove a, from A, a,,, from B. There is still 
a one-one correspondence between {a,d,...@,-;} from A and the 
same elements plus a, in B. It is this which we have taken as 
impossible. So, if impossible for (η -- 1), it is impossible for n. The 
proof by induction is complete. Q.E.D. 

One consequence of the theorem is immediate: a finite set A 
cannot be equi-numerous with two sections S,, and S,, of J+ (n 4m). 
Suppose m<n, so that S,, is finite and a proper subset of S,. If 
A~S, and A~S,, is possible, then S,,~S,, which is ruled out by the 
theorem. Hence 

A~S,~S8S,, if and only if n=m. 
It follows that the integral n of S,, in the definition of A finite, is 
unique. In writing A = {a,, dg, ... a,} for a finite set, the integer 7 is 
unique; it is the cardinal number of A. All equi-numerous sets can 
be written: 
ASG Gg, τὰς Oy)§ B= 104, Og, λὲν δ) ἘΝ ox 
- the cardinal number n being unique, the same for each. 

Since we can now count a finite set, we can proceed to operate 
with the various counts of finite sets A, B, C,... Addition of num- 
bers (of elements) is the most important of the operations. Write 
n(A) and »(B) as the number of elements in the sets A and B 


* This form of mathematical induction, when spelled out, implies: impossible for 
nm =1, hence impossible for ἢ =2, hence impossible for n =3,... and so impossible, 
generally, for any n. 


102 SETS [4 


respectively. What is n(A +B) where 4 +B is the union of A and 
B? By the definition of addition of cardinal numbers (4.5): 
n(A+B)=n(A)+n(B) if A and B are disjoint. 
We can add the numbers of elements in two sets, if they are disjoint, 
i.e. if they do not overlap. So far, so good (and so obvious). The 
extension to the result when A and B do overlap is a matter of 
picking out suitable disjoint sets before adding numbers of elements. 
The Venn Diagram of Fig. 4.6a shows dis- 
A B joint sets making up A, B and (A + B). Here 
AB is written for A x B or ANB. So: 
n(A+B)=n(AB)+n(AB’)+n(A'B) 
n(A)=n(AB)+n(AB’) 
i n(B)=n(AB)+n(A’'B). 
1=AB 2=AB'3=A'B 4=AB’ Ἠρῃοδ: 


Fia. 4.6a n(A+ B)=n(A)+n(B)-n(AB) ...... (1) 


Again, this is sensible enough for, in writing n(A)+7(B), we count 
the common elements, 7(AB) in number, twice over. 

Two sets A and B divide the totality of elements into four disjoint 
and exhaustive subsets, numbered 1, 2, 3 and 4 in the Venn Diagram. 
The same process for three sets A, B and C requires eight disjoint 
and exhaustive subsets of the totality of elements, as shown by the 
Venn Diagram of Fig. 4.6b. Exactly as before, it follows that: 
n(A+B+C) 

=n(A)+n(B)+n(C) -n(AB) -n(BC) -n(AC)+n(ABC) ...... (2) 
The result (2) can be regarded as repeated 
applications of (1). More directly, if the 
numbers on the right-hand side of (2) are 
given, then the numbers in all other sets and 
combinations of sets can be obtained by a 
process of differencing. In particular, the 
numbers in the sets labelled 1, 2, 3, ... 8 in 
the Venn Diagram are obtained in sequence: 


n(A BC) =given 


1=ABC 2=ABC’ 3=BCA’ 


n(ABC’)=n(AB)-n(ABC) given 4=ACB’ 5=AB(C' 6=BAC' 
n(BCA’)=n(BC)-—n(ABC) given 7=CAB 8=ABC 
© eee ee ee eo Se ae ee Fig. 4.65 


6] SETS 103 


It is required that each of these differences is non-negative, i.e. that 
the given numbers are consistent. The following examples illustrate 
two cases where the process turns out to be consistent and inconsistent 
respectively : 

(i) A market research team interviews 100 people, asking each 
whether he smokes any or all of the items, 4: cigarettes, B: cigars, 
C: pipe tobacco. For one individual, the answer could be none, any 
one of them, any pair of them or all three of them. The team returns 
the following numbers of over-lapping categories: 


Category No. Category No. 
ABC All three 8,14 Cigarettes — 42 
AB Cigarettes and cigars 7/ B Cigars 17 
BC Cigars and pipe 8 |C Pipe 27 
AC Cigarettes and pipe 12 Total cases 100 


It is required to unscramble these returns, by elimination of over- 
lapping, and to find (e.g.) the number of people who smoke cigarettes 
and/or cigars but not a pipe. In particular, the number of non- 
smokers is required. In the course of doing this, it can be determined 
that the returns are consistent. The Venn Diagram of Fig. 4.6c 
has eight disjoint categories and the number in each is in sequence: 

n(ABC)=3 (given) 

n(A BOC')=n(AB)-n(ABC)=7-—3=4 


Consequently, entering the numbers in the diagram, we find the 
number of cigarette and cigar smokers who do not smoke a pipe: 
26+4+4+5=35. The number of non-smokers: n(A’B’O’)=38. This 
can be checked by means of the formula (2): 
n(A+B4+C)=424+174+27-7-—8-12+43 
= 62 
and n(A’B'C’)=100-n(4+B+C) =38. 
_ (ii) Suppose the team made the returns as 
in (i), except for the entries: 
BC Cigars and pipe 13 
AC Cigarettes and pipe 18. 
There is still no reason, on the face of it, 


104 SETS [4 


to expect inconsistent returns. The split into eight disjoint categories 
proceeds as before: | 
n(ABC)=3 n(ABC')=7-3=4 n(BCA')=13-—3=10 
n (AC B’')=18-3=15 n(A B’C’) = 42 —-(8+4+415)=20 
n(BA'C’)=17 -(3+4+10)=0 
n(CA’ B’) =27 -- (3 +10 + 15) <0 inconsistent 
1.6. the returns on pipe smoking are inconsistent. The figures are 
entered on a Venn Diagram as in (i), until an inconsistency (negative 
residual) appears. 


4.7. Countably infinite sets. An infinite set is simply a set which is not 
finite, i.e. a set which cannot be put into one-one correspondence 
with any section of the set J+ of positive integers. As a corollary of 
the theorem of 4.6, it follows that the set J+ 1s infinite. For, if J+ is 
finite, then S,,~J+ for some section S, of J+. But S, is a proper 
subset of J+; this contradicts the theorem.* 

The set J+={1, 2, 3, ... n, ...}, ordered in sequence, is the simplest 
kind of infinite set and (intuitively) it must have the ‘lowest’ count 
of all infinite sets. However this may be, consider infinite sets 
A, B, C,... each of which can be put into one-one correspondence 
with J+ (A~B~CWH...~J+). They are called denumerable or 
countably infinite. They all have the same cardinal number which 
can be written d (for denumerable). From the definition of 4.5, it 
follows that n<d for any finite cardinal number (positive integer) n 
since a subset of J+ (i.e. S,)~S,. Moreover, d4n, since d=n implies 
that there is a one-one correspondence between J+ and S,, or that J+ 
is finite. Hence »<d for any integral n. Finally, if A is countably 
infinite, the one-one correspondence with J+ means that the elements 
of A can be denoted with subscripts 1, 2, 3, ... n, .... All this can be 
summed up: 

DEFINITION: A set A is countably infinite if A~J*. It can be denoted 

A = {Qy, Aq, Ag, ... An, ...} and its cardinal number is d>n. 


Finite and countably infinite sets together can be described as 
countable. To tidy up the notation and summarise the position, we 
can say (as can easily be established) that a set A is countable if and 
only if it can be written A = {a,, ας; a; ...} with distinct elements 


* Strictly, an axiom is required in set theory to ensure that infinite sets do exist. 
The axiom (see 15.3) can be simply that all natural numbers form a set Jt. 


7] SETS 105 


having distinct subscripts. A is finite if there is a last element a,,, the 
number of elements being n. A is countably infinite if the sequence 
continues indefinitely, the number of elements being d. | 

The theorem of 4.6 holds only for finite A. If A is infinite and a 
proper subset of B, then A ~ B is still a possibility. It may be that an 
infinite set can be put into one-one correspondence with a proper 
subset of itself. This is ‘may be’; it has not been established that it 
‘must be’. However, for countably infinite sets, it is ‘must be’, in 
virtue of the result: 


THEOREM: Any countably infinite set A can be put into one-one 
correspondence with some proper part A, of itself. 


This apparent paradox (sometimes known as the Paradox of Galileo, 
after Galileo, 1564—1642) is easily illustrated. The set J+ ={1, 2, 3, ...} 
is countably infinite, as is the set {2, 4, 6, ...} of even positive integers 
which can be put into one-one correspondence with J+. Hence J+ 
can be put into one-one correspondence with a proper subset of 
itself. The general proof of the theorem is also simple. Let 


A = {@,, A, As ...} 


and write a proper subset A, = ίας, a, d4, ...}, i.e. A itself without the 
first element a,. There is an obvious one-one correspondence (a, with 
1, ἂς with ας, ...) between A and A,. This proves the theorem. All 
that has been demonstrated (or indeed illustrated) is that there is 
some countably infinite set contained within any given countably 
infinite set. It is clear, however, that there are many such, e.g. 
within J+ the following proper subsets are all countably infinite: 


(2, 8, 4, ...}, {2, 4, 6, ...}, (1, 3, 5, ...}, (2, 4, 8, ...}. 


The range of countably infinite sets is remarkable. The following 
very broad result demonstrates how they can be ‘manufactured’ one 
from another: 


THEorEM: The union of a countable set of countable sets is itself 
countable. 


Proof: let A, = {a1, @2, 21, ...}, Ag = (41, Gea, Mg, --.}, Ag ={Ag1, Ago, 
sg, ...}» --- be the given sets. Throw all the elements together to get 
the union A=A,+A,+A,+... and A contains a,,, the sth element 
of A,, for any integral r and s. There is one element a,, with r+s=2, 
two elements a,, with r+s=3, ... and a finite number of elements 


106 SETS [4 


with r+s=n (any positive integer). Put the elements a,, into this 
sequence, re-labelling them with one subscript only. The elements 
are thus countable. It does not matter whether A,, A., Az, ... are 
disjoint or not; if not, the suppression of the repeated elements does 
not affect essentially the sequence of the elements of A. Q.E.D. 
As special cases of the theorem, we can write the union of a finite 
number of finite sets (itself finite), or the union of a finite number 
of countably infinite sets (itself countably infinite), or the union of a 
countably infinite number of countably infinite sets (still countably 
infinite). This helps to explain why countably infinite sets are of such 
frequent occurrence. J+ (positive integers) is countably infinite, and 
so is J (all integers) which is the union of two countably infinite sets 
(the positive, the negative integers) together with zero. Moreover, 
both the set of rationals #, and the set 852] of all polynomials f(x) 
with rational coefficients and undefined xz, are countably infinite. 
Consider these in turn, in order to see how (following the lines of the 
general proof above) the whole set can be arranged in sequence. 
Within &, the positive rational numbers can be put into sequence: 


12 1 383 2 1 4 3 2 ἃ 
Τ᾿ Τ᾽ 5’ Τ' 5 8’ Τ᾿ 3᾽ 8’ 1’ 
Every positive rational fits into its place. The rational p/q is the nth 
element in the sequence, where n=q+4(p+q-—1)(p+q-2). The 
elimination of duplication (e.g. 2/2 or 4/2) makes no difference; a 
countably infinite sequence remains. The inclusion of the negative 
rationals (and zero) does not affect the result; two countably infinite 
sets are combined. # is arranged as a countably infinite sequence. 
The set F[x] of polynomials with rational coefficients is an ex- 
ample of a countably infinite set of countably infinite sets. Consider, 
first, polynomials with integral coefficients. Those of zero degree can 
be put in sequence: 
0, +1, -1, +2, -2, +38, —-3,... 
Adding the linear polynomials (of degree one), we get a square array: 
OG “i ae Go: κε 
x a@+1 4-1 £42 -2... 
Qn Qe+1 Q-1 QZw+2 Ww-2... 
3a 3a4+1 8.-1 3242 3r-2. 


7, 8] SETS 107 
A single sequence is obtained by running up and down diagonals: 
0, z, +1, -1, r+1, 2x, 3a, 2x4+1, ~-1, +2, —2, +2, 


This is made into an array by adding the 2? terms for quadratic 
polynomials (of degree 2), again turned into a sequence, and so on. 
In the end the whole set of polynomials appears as a single sequence; 
it is countably infinite. Finally, a polynomial with rational coeffi- 
cients is a rational multiple of a polynomial with integral coefficients. 
The countably infinite set of the latter is repeated a countably 
infinite number of times (once for each rational) to give F[2]. Hence, 
F [x] is countably infinite. 


4.8. Transfinite arithmetic. The arithmetic of finite cardinal numbers 
is simply the arithmetic of the positive integers. A new number is to 
be added, the cardinal number d corresponding to countably infinite 
sets and shown after the sequence of finite integers: 


1, 2, 3,...,...d@ (where all n<d). 


The arithmetic can be appropriately extended, as will be done here 
for the operation of addition. Again something new is to be expected. 
From 4.5, the formal process of adding two cardinal numbers (in- 
finite as well as finite) is no problem. It is only a matter of forming the 
union of appropriate disjoint sets and of writing the cardinal number 
of the union. The difficulty, rather, lies in the interpretation of this 
transfinite arithmetic. One point is evident; since the arithmetic of 
cardinal numbers reflects operations with sets, it is to be expected 
that it will display features of Boolean rather than ordinary algebra. 

As a consequence of the second theorem, of 4.7, it follows that, for 
any positive integer n and the cardinal number d: 


d+n=d; ἀπάτα; ἀπάτα... τάς «ὐὐννννννννννννννς (1) 


There can be a countably infinite set of d’s in the left-hand side of 
the last result of (1). To prove, it is only necessary to write approp- 
riate disjoint sets, to form their union and to use the theorem. 

This is not the end of the story. The ‘infinite’ is not a simple con- 
cept; it is found to have a structure. Despite the fact that so many 
infinite sets are countable, it is easy enough to find one that is not. 
The set &* of real numbers contains within it many proper subsets 
which are countably infinite, e.g. the set R of rationals and the set 


108 SETS [4 


f(./2) obtained by adjunction of /2 as in 2.3 (4.9 Ex. 18). But the 
real numbers are so thick on the ground that, no matter how many 
countably infinite subsets are removed, there are always as many 
and more left. The result, due to Cantor (1845-1918), is: 


THEOREM: The set R* of real numbers is not countably infinite. 


Proof: start with all real numbers x between 0 and 1 (0<z#<1). 
Each of them can be written as a decimal, in general not terminating. 
Suppose that the real numbers are countably infinite so that they 
can be written in a sequence: 2:1, 2:2» 3.3» .... In decimal form: 


71 ΞΞ 0 e 41442013 eee 
He =O . AgyAaeQog -.. 
Xz = 0 . (51 53.055 eee 


where all the a’s are digits from the set {0, 1, 2, ... 9}. Consider the 
diagonal of digits: @1,, dg, 33» ..- Ann» --- Which is countably infinite. 
Form a new sequence of digits: 6,, b., 63, ... b,, ... where 


b,=1 if a,,=0 and b,=a,,-1 if a,,~0. 
Then each of the ὃ, is different from the corresponding a,,,. Write 
b=0 - b,b.b, eee 


which is a real number (between 0 and 1). However, 6 is different 
from each of the real numbers 2, 2:2» 3, ..., 1.6. it differs from zx, at 
least in the nth decimal place (since b,a,,). This contradicts the 
assumption that the real numbers (between 0 and 1) are countably 
infinite, all being comprised in the sequence 2,, 2, 73, .... Hence the 
real numbers between 0 and 1 are not countable. Similarly, the 
(double) set of real numbers between -- 1 and 1 is not countable. 

To extend to the whole set of R*, it is only necessary to get a one- 
one correspondence between the set of real numbers -- 1<a<1 and 
the set of all real numbers y. Such a correspondence is provided by 
y =2/(1 —2*) (--Ἰ «ὦ -:1}. See 4.9 Ex. 19 and 20. It follows that the 
whole set R* and the real numbers (—1<x<1) have the same 
cardinal number. Write it c, so that d<c since there is a proper 
subset of R* (e.g. the positive integers) which is in one-one corre- 
spondence with J+. But cd since R* is not countable. Hence, as a 
definition and summary of results obtained: 


8] SETS 109 


DEFINITION: The non-countable set R* of real numbers has the 
cardinal number c>d. 

In the course of the proof above, it was established that the infinite 
set &* could be put into one-one correspondence with a proper 
subset of itself, the real numbers (—1<x<1). The set R* with 
cardinal number c contains within itself a part (a proper subset) also 
with the cardinal number c. 

Transfinite arithmetic is now extended further to include in 
sequence: 

1,2,3,...,...d,c (all n<d<c). 


At this stage, having shown the possibility of more than one infinite 
number, we can leave the development to those interested in this 
fascinating but (except for the main ideas) not very practical subject. 
As far as addition is concerned, it can be shown that the results (1) 
extend: 


CHN=C;C+A=C3CHCHO H.C caccccceeees aeaues (2) 


It would appear, from (1) and (2), that the sum of two or more 
infinite numbers is simply the greatest of the numbers. This is, in 
fact, the case; and for products as well as sums. It is a reflection of one 
of the rules of Boolean Algebra: A + U =U where U is the universal 
set. Replace U by an infinite cardinal number and A by a lower 
cardinal number, and the main result of transfinite arithmetic for 
sums is obtained. 

There is still some tidying up to do. A finite set is defined (4.6) as 
one which can be put into one-one correspondence with {1, 2, 3, ... } 
for some natural number n. This inductive definition of the finite 
makes use of the characteristic feature (mathematical induction) of 
the natural numbers. A finite set is such that its members can be 
paired off with 1, 2, 3, ... until some n is reached and there is nothing 
left over. The corresponding inductive definition of an infinite set is 
by negation: an infinite set is not finite, not ‘countable’ against any 
natural number n. 

The theorem of 4.6 states that a finite set cannot be put into one- 
one correspondence with a proper subset of itself. There is a property 
here which can be given a convenient label: 


Derrinition: A set is reflexive if it can be put into one-one corre- 
spondence with a proper subset of itself. 


110 | SETS [4 
The theorem on finite sets can then be re-stated : 

THEOREM: A finite set 1s not reflexive. 
Now put up for examination the converse: 

CONVERSE THEOREM: A non-reflexive set 1s finite. 


We have not yet proved this; clearly we would very much like to do 
so. For the moment, let us take the result as established and see what 
further light is thrown on the nature of the infinite. 

On the inductive definition, an infinite set is one which is not 
finite; it may still be reflexive or non-reflexive. On the converse 
theorem, a set which is not reflexive must be finite, cannot be infinite. 
So the possibility of an infinite and not-reflexive set is ruled out; an 
infinite set must be reflexive. By 4.7, a countably infinite set is 
reflexive; we now see that all infinite sets are reflexive. The charac- 
teristic property of the infinite is reflexivity. With infinite sets it 
becomes possible that two sets of different ‘sizes’ can be made to 
correspond member by member, as the set of even integers corre- 
sponds to the ‘larger’ set of all integers. We have the tidy result: 

(i) A finite set is both inductive (countable against ) and non- 

reflexive (not in one-one correspondence with a proper subset). 

(ii) An infinite set is both non-inductive (not countable against 7) 

and reflexive (in one-one correspondence with a proper subset). 
Indeed, an alternative and equivalent definition of the finite and 
infinite can be given: an infinite set is one which is reflexive and (by 
negation) a finite set is one which is not infinite. 

The problem remains: can we prove the converse theorem. At first 
sight it seems rather an easy matter. Suppose a non-reflexive set A 
is infinite. Then select any member a of A and form the set A, of all 
elements of A except a. Select any member a, of A, and proceed to 
form the set A, excluding a,. Select any member a, of A, and continue 
the process. A countably infinite set B= {a,, ag, ds, ...} emerges as a 
proper subset of A, proper since (at least) the element ὦ is not 
included. So. A = B+C, where C comprises all elements of A not in B. 
Now eliminate the element a, both from A (to get A’) and from B to 
get B’ ={ao, ds, ...}. Then A’ = Β' +C. There is a one-one correspon- 
dence between B and B’, i.e. a, with ας, a, with a3, .... This can be 
extended by including the elements of C on both sides of the corre- 
spondence to give a one-one correspondence between A and A’. 


8, 9] SETS 11] 


Since A’ is a proper subset of A, it follows that A is reflexive. But A 
is taken at the outset as non-reflexive. Hence, the non-reflexive set 
cannot be infinite; it must be finite as stated in the converse theorem. 

The proof, however, is defective. A finite sequence aj, ας, ἄς, ... can 
be selected from A. But can an infinite sequence be selected? To say 
that it can is, as yet, no more than an intuitive extension from the 
finite to the infinite. Can it be justified? Mathematicians are agreed 
that it cannot be justified, and that the only way out of the difficulty 
is to impose it as an axiom in the theory of sets. This is the Axiom of 
Choice which guarantees that an infinite set A contains within it a 
countably infinite sequence of elements a,, ας, a3, .... But agreement 
goes no further. Mathematicians have been and are still divided on 
the question, not whether the Axiom is ‘true’ (which is not the ques- 
tion to ask about an axiom), but whether it should be permitted to 
take its place with the other axioms in the theory of sets. If the 
Axiom is so permitted — and at least it is known to be consistent 
with the other axioms — then the reflexive property of infinite sets 
follows, as do many other results in transfinite arithmetic and else- 
where in mathematics. If the Axiom is not permitted, then all the 
results which depend on it must also be disallowed. Here is an 
awkward choice for mathematicians to make; most accept the Axiom 
more or less reluctantly while there are some who refuse to admit it. 
Even in mathematics all is not agreed. 


4.9. Exercises 


1. At ἃ particular moment of time, take U as the set of all people ever born 
in the world. Consider the following sets 

A =present population of the world, 

B=all ancestors of the present population of the world, 

C =present population of the United Kingdom (Great Britain and N.Ireland), 
D=present population of Ireland (Republic of Ireland and N. Ireland). 
Show that B’=A and that CC A, DC A. What is the set CUD? Show that 
the present population of Great Britain is the set COD’, of the Republic of 

Ireland the set C’AD and of N. Ireland the set COD. 

2. Two dice are thrown, giving digits n, and 7.3. List the set A of all possi- 
bilities, where A = {(n,, 4) | myeJ4, Need g} for J, = {1, 2, 3, 4, 5, 6}. How many 
elements has 4? How many elements are there in each subset A, where A, 
consists of the set of throws with sum n,+n,=7r (r=2, 3,... 12)? 


+ For example, the result that any two cardinal numbers (d, c and others) are 
commensurable, one of the relations >, = and < holding between them. There are 
also critical results in the calculus which depend on the Axiom. 


112 SETS [4 


3. In Ex. 2, interpret the set B ={n | nyeJ 4, naeJ 4, n =N, +n} and show that 
B={2, 3, ... 12}. 

4. The high table at a banquet has a row of seats for the chairman, two 
V.1.P.’s (A, and 4.) and their wives (B, and B,). Show that there are 24 
elements in the set of all seating arrangements, given only that the chairman is 
in the centre. How many elements are there in the subset when (apart from 
the chairman) the sexes alternate and in that when B, and B,, bitter rivals, 
are separated by more than one person? 

5. Indicate the sets ANB, ANB’, A’OB and A’OB’ on a Venn Diagram, 
showing that they are disjoint and partition the universal set U. 

6. Draw a Venn Diagram for A’UB and show that (A’UB)’=ANB’. 
Translate into the form of the rules of 4.3 and check by rules 9 and 10. 

7. Use Venn Diagrams to show: AN(BUC) =(ANB)U(ANOC); which rule 
of 4.3 is this? 

8. Draw Venn Diagrams to show: AU(BUC)=(AUB)UC; AUA=A; 
AUA’=U. Identify as rules of 4.3. Write and establish the corresponding 
rules for ὦ. | 

9. By showing that each is A, establish: (A4UB)MNA=(ANB)U(ANB’). 

10. An interviewer asks: Do you like this chocolate bar? He conducts inter- 
views and records the results: 


Men 10 20 δ 
Women 20 15 5 
Children 10 5 10 


ἱ 


Write A =set of adults, C =set of women and children, Y =set of ‘yes’ answers 
and N =set of ‘no’ answers. Identify and find the number of each of the sets: 
A’, ANC, (YUN), AN(YUN)’. Noting that the 10 yes-men make up the 
set C’ Y, express each of the above categories in this form. 

11. A Boolean algebra of two elements. The set {0, 1} is closed under + and 
x, as defined by the tables of 4.4. Check that 0+(1+4+0)=(0+1)+0, 
0+1=1+0,1+1=1,1x(0+1)=1x0+1x1land1+(0x1)=(1+0) ΧΑ +1). 
Complete the checking of the operational rules of 4.3. 

12. Difference of sets. If A —B denotes AB’ for BCA, show that U -A=A’, 
A —-A=0 and A -0=A. Show that (A -- Β) -C=A --(Β- ΟἹ can be written 
if B and C are disjoint and such that BC A and CC A. | 

13. Establish a one-one correspondence for {2n | neJ,}~{2n —1 | neds} 
where J, = {1, 2, 3, 4, 5}. Deduce that there are as many even as odd integers 
from 1 to 10 inclusive. Why is this not true for 1 to 9 inclusive? Generalise. 

14. The results of Ex. 10 can be expressed: if P is the set of adults who say 
yes and @ the set of women and children who say yes, then n(P) -- 80, 
(0) =30, n(PQ) =20. Check that n(P +Q) =n(P) +n(Q) —n(PQ). Interpret 
n(P +Q)’ =60 as the number of those who say no or don’t know. 


9] SETS 113 | 


15. An insurance company classifies a set of 50 ‘lives’ according as they are 
men (or women), married (or single), British (or foreign) and gets the over- 
lapping groups: Men 18, Married 20, British 39; Married men 7, British men 
14, British married persons 16; Married and British men 6. Analyse into non- 
overlapping groups, using the method of 4.6. Show that, out of 11 foreign 
‘tives’, 4 are single women. 

16. Because of an error of transcription, the number of married ‘lives’ in 
Ex. 15 is recorded as 25 (not 20). Check that the data are then inconsistent. 

17. Removal of elements from a countable set. From a countable set A, a set B 
is got by removing a countable number of elements. Show that B is countable. 
Interpret when A is finite. If A is countably infinite, show that B must be 
countably infinite when a finite number of elements is removed, and use the 
set of positive integers to show that B may be finite or countably infinite when 
a countably infinite number of elements is removed. 

18. If a and ὃ are any rationals, arrange the elements a +06,/2 in double 
array, order by diagonals and show that R(./2) is countably infinite. 

19. If y =a/(1 —x*) is defined on the domain of real numbers 0<x2<1 show 
that the range of y is the set of all positive real numbers. Proceed: given 
any real y>0, show that there is a corresponding 


x =/{(1/4y?) +1} —(1/2y)>0. 


Then, by showing that ,/{(1/4y*) +1}<(1/2y) +1, deduce that x<1. Hence 
establish a one-one correspondence between the set 


X = {x ἃ a real number, 0<2<1} 


and the set of all real positive numbers. 

20. Extend the result of Ex. 19 by defining y on the domain —1<2x<1 and 
establishing a one-one correspondence between the set of real numbers 
' —1<a<1 and the set of all real numbers. 

*21. Show that the set of Gaussian integers (3.2) is countable. 

 *22. Consider the set S of all roots of all polynomial equations with rational 
coefficients. Show that only the corresponding polynomials with integral 
coefficients need be taken, a countably infinite set (4.7). Put the roots in 
sequence by considering the roots of each polynomial equation of degree n 
and deduce that S is countably infinite (even if duplication is not eliminated). 

23. Show that the ‘rule’ that the part is smaller than the whole is equivalent 
to: all sets handled are non-reflexive. Can you say that the ‘rule’ holds only 
for finite sets, failing for any infinite set? 


CHAPTER 5 


STATEMENTS AND PROBABILITY 


5.1. Statements. A simple statement is an assertion. In a given case, 
it is either true or false; it cannot be both. Statements can be made 
without regard to their truth, for they may be sometimes true and 
at other times false. As a notation, represent simple statements by 


small letters: p, g, r, .... For example: 
(1) p: the child is a boy; gq: the child is tall; r: the child has fair 
hair 


(1) p: equity prices are high; q: all equity prices are rising; 7; some 
equity prices are rising 

(ili) p: the triangle A is equilateral; g: the triangle A is isosceles; 
γ᾽ the triangle A has at least two unequal angles. 


It is clear, from these examples, that the entities and features re- 
ferred to must be understood, e.g. ‘child’, ‘equity prices’, ‘equilateral’. 
In some cases, moreover, definitions need to be supplied, as in the 
use of ‘tall’ which might be taken as over 65 inches for a twelve-year- 
old child, and for ‘high’ which could be specified, for equities on 
Wall Street, as Standard and Poor’s index (for 420 industrials) of 
over 200, the level of 1947-9 being 100. 

A compound statement is formed from simple ones by the use of 
defined connectives. Three usual connectives correspond to ‘not’, to 
‘or’ and to ‘and’: 

Negation ~p: not p Assertion of the negation of p. 

Disyunction pvq: p or q Assertion of either p or q (or both). 

Conjunction pag: pandgq Assertion of p and q together. 


Two other essential connectives are conditions, leading later to 
assertions of implication and equivalence: 

Conditional p—q: if p then q. 

Bi-conditional pq: if p then q and if ᾳ then p. 


1] STATEMENTS AND PROBABILITY | 115 


No causal relationship is intended here. Like a simple statement, 
any compound statement such as p->g may be true or false (but not 
both) under given circumstances. Several connectives may be used 
in making further compound statements from those already con- 
structed. Various examples illustrate: 
(i) ~p: the child is a girl 
gar: the child is tall and fair-haired 
(~p)—gq: if the child is a girl, then she is tall 
(ii) pAg: equity prices are high and rising 
~r: no equity price is rising 
(~q)Ar: some but not all equity prices are rising 
q->r: if all prices are rising, then some prices are rising 
(ili) pv q- the triangle A is either isosceles or equilateral 
~r: the triangle A has no unequal angles , 
»ε»(- 7): if the triangle A is equilateral then it has no unequal 
angles, and conversely. 


In a given problem, there will be a certain number of logical 
possibilities, perhaps only a few, more probably a large number. It 
will be enough to illustrate with a simple example where the logical 
possibilities are few. Consider a triangle A with reference to equal 
or unequal angles. There are five cases only in the set of all logical 
possibilities, as shown below in relation to whether the statements 
p, 4 and r of (iii) above are true (T) or false (F): 


a ha ee a τ" - 
Pp q r 
Case Angles A, B,C | Equilateral Isosceles At least 2 unequal angles 


1 A#B#C F F T 
2a AzB=C F T T 
26 B+C=A EF T T 
2c C+A=B F T T 
3 A=B=C T T F 


However, as far as the three statements are concerned (though not 
necessarily other statements), there are only three different logical 
possibilities, since those labelled 2a, 2b and 2c all have p false, g and r 
true. It is enough to carry only three cases: 1 — all angles unequal; 
2 — two angles equal and the other unequal; 3 — all angles equal. 
Statements such as p, g and r are true in some cases and false in the 


E A.B.M. 


116 STATEMENTS AND PROBABILITY [5 


others, and so are various compound statements, as shown in the 
table: 


Case | . r | qar| pvr 


F T 
T T 
¥ T 


The interpretation of gar is: the triangle A is isosceles and has at 
least two unequal angles, which is true in case 2 and false in cases 1 
and 3. However, pvr is true in all cases, since it means: the triangle 
A is equilateral or it has at least two unequal angles. 

Most of our interest in statements is concentrated on those which 
are true in all logical possibilities, or false in all. One example is 
shown in the above table: pvr is true in all three possible cases. 
Such a statement is called logically true. If a statement is false in all 
cases (as the statement par), then it is called logically false. 

The conditional connectives -- and « are most important when 
logically true or false. Consider two examples: 


Case | p 4 »-» 4 
1 Ἐ F (T) 
2 Fr T (T) 
3 T T T 


The statement p—>q: if p then q. As the table shows, ρ-- 4 is true in 
case 3. There is an uncertainty in the other two cases where p is 
false; the statement p->q would not seem to apply. But the state- 
ment is not false and so (by convention at least) it can be marked as 
true, as shown by (T) in the table. Accepting this situation, we say 
that p—>q is true in all cases, i.e. logically true. The only situation 
which would make p->q false (p true but q false) does not arise. The 
interpretation is that, in all cases, if A is equilateral, then it is 
isosceles. This situation is described as: p implies q. 


Case| p ~r | p<>(nr) 
1 F F T 
2 F F T 
3 ΤΤ,'ΙῬ3 T 


1] STATEMENTS AND PROBABILITY 117 


The statement p(~r): if p then (~r) and if (~r) then p. The table 
shows that the statements p and (~7r) are true (false) in precisely the 
same cases. Hence, p++(~7) is logically true. It could be falsified if p 
true but q false, or if p false but φ true. Neither situation arises. The 
interpretation is that, in all cases, if A is equilateral, then it has no 
unequal angles, and conversely. The situation is described as: p and 
(~r) are equivalent. | 

The following definitions are thus to be made for any statements: 

DEFINITION: p implies ᾳ 1f p-—>q 18 logically true, 1.6. “if p then q’ is 
true in all logical possibilities; p and q are equivalent if pag is logically 
true, t.e. ‘if p then φ᾽ and ‘if q then p’ are true in all logical possibilities. 
Notice that p implies qg rules out only one situation (p true but φ false) ; 
p and q equivalent rules out two situations (p true but q false, p false 
but q true). 

Two logically false statements of interest can be illustrated, again 
with reference to statements on the triangle A. 


The statement por: if p then r and if r then p. The table shows that 
the statement is logically false. The situations ruled out are those 
in which Ὁ and r are both true or both false. Here p and 7, are contra- 
dictories. It reflects, of course, the equivalence of p and (-~r); the 
statement that A is equilateral contradicts the statement that A 
has at least two unequal angles. 


» qar | PA(GAr) 
F 


Lt) 
¥ 


The statement p A(q Ar): p and g and r together asserted. Again the 
statement is logically false, but now the only situation ruled out is p 
true and (q¢ λ 7) true. These two (simpler) statements cannot both be 
true; they are described as contraries. In this example, p is that A 


118 STATEMENTS AND PROBABILITY [5 


is equilateral; (¢ Ar) is that A is both isosceles and has one angle 
unequal to the other two. There is no triangle in which both these 
things are true. 

The tables with T and F entries against the various logical 
possibilities are instances of truth tables; they are useful tools in 
analysing the logic of statements. Of even more use, however, is the 
representation of statements in terms of sets, illustrated by means of 
Venn Diagrams. 


5.2. Statements and sets. Consider sets of logical possibilities in a 
given problem. A statement p is specified ; it is true in some and false 
in the other logical possibilities. Denote by P the set of logical 
possibilities for which Ὁ is true, the truth set of the statement p. Two 
bounds can be defined: the universal set U of all logical possibilities 
and the empty set ¢. U is the truth set of a statement which is logically 
true, @ of a statement which is logically false: 


@SPCU for any P. 
If two statements » and q are specified, 


Ρ Q 
write P for the truth set of p and Q for the 
truth set of ¢. The sets P and Q serve to 
divide U into four disjoint and exhaustive sets 
4} (one or more of which may be empty). This 
is shown in the Venn Diagram of Fig. 5.2, 
mee? the sets being numbered: 
1 2 9 4 
Set 
in which: | ΡΟ PaAQ’ P’'nQ P’nQ’ 
|p T T Ρ F 
| q T F T F 


1 


Further: PUQ=union of sets 1, 2 and 3 in which either p or q true, 
(PUQ) =P’ αρ' =set 4 in which both p and q false. 
Consequently, compound statements under the three connectives: 
~= ‘not’; ν =‘or; A =‘and’ 


are represented by sets according to the scheme: 


2] STATEMENTS AND PROBABILITY 119 


Statement Truth Set | No. in Venn Diagram 
~p i 3 and 4 
ρνῃ PQ 1, 2 and 3 
PAW PNQ 1 


The connective ~ corresponds to complement, v to union ὦ and a to 
intersection ἡ in the operations on sets. The parallel is perfect so that: 


THEOREM: The algebra of statements under the connectives ‘not’, ‘or’ , 
‘and’ is a Boolean Algebra. 


Statements are to be translated into their truth sets and the opera- 
tional rules of 4.3 applied. In this Boolean Algebra, any logically 
true statement plays the role of the upper bound U and any logically 
false statement the lower bound ¢. In between are statements true 
in some and false in other logical possibilities. 

The translation of implication and equivalence into set terms can 
be easily achieved. The statement p->q is false only when p is true 
but q false. If P and Q are the truth sets, p—>q is false only in the set 
PQ’, which is numbered 2 in the Venn Diagram. Hence the truth 
set of pq, comprising the union of sets 1, 3 and 4 in the diagram, 
is the complement of PNQ’: 


Truth set of poqg=(PnQ') =P'V(Q’Y =P'vQ 


by Boolean Algebra. As a check, since P’ is sets 3 and 4 and Q is sets 
1 and 3 in the diagram, P’UQ is sets 1, 3 and 4. It also follows that the 
statement p—>q is equivalent to (~p) vq, each having the truth set 
P’VQ. 

The statement p implies q is defined as p—q logically true. This 
means, in terms of sets: 


P'UQ=U or PnQ’=¢ 


i.e. the truth set of p—>q is the universal set, the set in which p—gq is 
false is the empty set. Check by Boolean Algebra: 

If P’uQ=U, then d=U' =(P'VQ)’ =(P')'nQ' = Png". 
Now if PnQ’ is empty, then P and Q’ are disjoint and P must be 
contained in 9: P<Q. In terms of truth sets, ‘p implies q’ means 
P <Q. Equivalence follows immediately. The statements p and q 
are equivalent if p implies ῳ and q implies p. In set terms, P <Q and 


120 STATEMENTS AND PROBABILITY [5 


QP, which means P=Q. The truth sets of p and q are the same. 
Hence: 

THEOREM: The statement p implies the statement q if and only if the 
truth set of p 1s a subset of the truth set of ᾳ (P <Q). The statements Ὁ 
and q are equivalent if and only if the truth sets coincide (P =Q). 


It also follows that, if p implies φ without qg implies p, then the truth 
set of p is a proper subset of the truth set of g (PC Q). 

An immediate consequence is: p implies g means P <Q and this in 
its turn means Q’CP’, i.e. that (~g) implies (~p). Conversely, 
(~q) implies (~p) means Q’ ¢ P’, which means P <Q or p implies q. 
Hence: 


THEOREM: The statements ‘p umplies q’ and ‘(~q) implies (~p)’ are 
equivalent. 

This is a result which has a bearing on the methods of proof used 
in mathematics. If we are given a result or condition p and if we 
wish to deduce a consequential result or condition qg, the direct 
method of proof is to show p implies q, i.e. if p then q in all logical 
possibilities. It is equally valid to use an indirect method of proof, 
to show (~gq) implies (~p). That is, given p, assume the contrary 
(~q) of what is to be established and go on to deduce (~p). Hence a 
contradiction, both p and (~p). This means that the contrary 
assumption (~q) must be abandoned and q is established. This is the 
basis of the proof by reductio ad absurdum. As a simple illustration, 
take the following not-quite-trivial example. Given that the product 
of two even positive integers is even, it follows that n? odd implies 
n odd (n a positive integer). The proof is indirect. Take n? odd (given) 
and assume n even. Then n?=n x n is even. There is a contradiction, 
n® odd and n? even. Hence ἢ is odd. 


5.3. Necessary and sufficient conditions. In many mathematical 
developments, instead of proceeding from proposition to proposition 
(p implies g, g implies r, ...), we may take a property g and seek the 
conditions p for it to hold. In such a situation, we try to find Ὁ so that 
‘g if p’, or ‘gq only if p’, or ‘gq if and only if p’. A good deal of uncer- 
tainty and confusion of thought is possible in this apparently simple 
procedure. And not without reason. For one thing, the use of the 
alternatives, p implies g and (~q) implies (~p), serves to disguise 


3| STATEMENTS AND PROBABILITY 121 


the fact that these are the same. Confusion is then made worse by 
the common habit of employing various terms or modes of expression 
for the same thing. Since there are alternative terms in use, we can 
do no more than be on the lookout for the various disguises. 

First, take p implies g. This can be expressed: if » then q in all 
logical possibilities; or, omitting the qualification, just: ‘if p then q’. 
This can be turned around to read: ‘gq if p’. This is what is described 
as a sufficient condition; here p is a sufficient condition for q. 

Second, take (~q) implies (~p). Omitting the same qualification, 
we can say: Ἢ (~q) then (~p)’, which is the same as: ‘only if g, then 
p’. This is turned around to read: ‘p only if q’. This is what is de- 
scribed as a necessary condition; here q is a necessary condition for p. 

Confusion can be avoided only by recognising and keeping always 
in mind the simple fact that all the statements written in the two 
preceding paragraphs are precisely the same. If p and q have truth 
sets P and Q, all statements correspond quite simply to P<Q. The 
first run of statements stems from p implies q, the second from (~q) 
implies (~p). These two are equivalent; both mean P <Q. Hence, 
if p and q are such that P <Q for their truth sets, then all the follow- 
ing statements and notations follow: 


Alternative 
phrasing 


Implication One phrasing 


Terminology 


p implies α if p then g q if p p is a sufficient con- 
dition for g 

~q implies ~p | if ~q then ~p | p only ifq | q is a necessary con- 
dition for p 


A similar table can be written for all the ways of writing: 4 implies p. 
It differs only in that p and q are interchanged. For example, if we 
are seeking p as a necessary condition for g, we would look at this 
second table, or (what is the same thing) interchange p and q in the 
table above. 

Turn now from implication to equivalence, i.e. take both p implies 
q and q implies p. There is a completely symmetrical relationship 
between p and q. Setting down in full all the ways of putting the 
position: if p and q are such that their truth sets are equal (P =Q) 
then all the following hold: 


122 STATEMENTS AND PROBABILITY [5 


Alternative 


Equivalence | One phrasing Terminology 


phrasing 
p and q if p then ᾳ and | qif and only if p| p is a necessary and 
equivalent | if ~p then ~q sufficient condition 
for q 
ᾳ and p if q then p and | pif and only ifqg| gq is a necessary and 
equivalent | if ~q then ~p sufficient condition 
for p 


Notice that, in the phrase ‘if and only if’, the necessary part is ‘only 
if’ and the sufficient part is ‘if’. A necessary and sufficient condition 
can only be established in two stages: first q only if p, meaning that 
for q to hold we must exclude all things other than what follows the 
‘only if’ (i.e. exclude everything except p); second ᾳ if p, meaning that 
q holds if we include everything that follows the ‘if’ (i.e. include 9); 
and so together q if and only if 9, the necessary and sufficient con- 
dition for q. 

T'wo examples from elementary geometry illustrate: 

(1) Conditions are sought for g: A is an isosceles triangle. Write 
p: 4 1s an equilateral triangle. Then p implies g. For, by elementary 
geometry, an equilateral triangle is isosceles. Hence q if p, and p is a 
sufficient condition for g. We have, in fact, overdone the conditions 
for an isosceles triangle. Certainly an equilateral triangle is isosceles, 
but many other triangles are isosceles too. p is not a necessary 
condition. 

Turn things around and seek conditions for p: A is an equilateral 
triangle. Write q: A is an isosceles triangle. Then (~q) implies (~p). 
For, if A is not isosceles, it is not equilateral either. Hence p only if q, 
and q is a necessary condition for p. We have, in fact, not got enough 
to ensure that the triangle is equilateral. g is not a sufficient con- 
dition. 

These two lines of reasoning are precisely the same. We put it one 
way or the other according to our view: seeking conditions for 4 
(isosceles triangle) or seeking conditions for p (equilateral triangle). 

(ii) Conditions are sought for q: the quadrilateral Q is a rectangle. 
More specifically, we seek conditions p in terms of the diagonals of Q. 

Write p: the diagonals are equal and bisect at right angles. Then p 
implies φ. This is indicated by A in Fig. 5.3 where the solid lines show 
the diagonals under 7, and the dotted lines show the resulting quad- 


3, 4] STATEMENTS AND PROBABILITY 123 


rilateral Q (clearly a rectangle). Hence q if », and 
p is a sufficient condition for g. Again, we have over- 
done it; Q is a rectangle all right, in fact a square. 

Write p: the diagonals bisect each other. Then q¢ 
implies p. This is indicated by B in the diagram, 
solid lines showing the rectangle (4) and dotted 
lines the diagonals under p. Hence q only if p, and 
p is a necessary condition for g. The position can also be put: (~p) 
implies (~q), i.e. if the diagonals do not bisect, then Q is not a 
rectangle. Here we do not have enough to ensure a rectangle ; 
bisecting diagonals can produce a parallelogram (C of the diagram). 

Write p: the diagonals are equal and bisect each other. Then p 
implies q, i.e. q if p. If we draw equal and bisecting diagonals, we 
must get a rectangle. Further, 4 implies p, or (~p) implies (~gq), i.e. 
q only if p. We cannot draw a rectangle without diagonals equal and 
bisecting. So: q if and only if p, and p is the necessary and sufficient 
condition for q. 


5.4. Probability. The concept of probability has to do with state- 
ments, with assertions or propositions. Consider a group of twelve- 
year-old boys; the statement: the boy is tall (over 65 inches) may 
be true for some and false for others in the group. Similarly, if we 
measure equity prices by Standard and Poor’s industrials index, the 
statement: equity prices are rising is sometimes true and sometimes 
false. In each case, there are two alternatives: boys tall or not tall, 
prices rising or not rising. How ‘likely’ is such a statement to be true? 
Can we put a measure on the ‘likelihood’? For example, we may say 
that ‘not many’ boys are tall and we may assess the ‘chance’ at well 
under 3, say 10 : 90 or 10 chances out of 100. In considerations of 
this kind, a variety of rather ill-defined terms tends to crop up: 
probable, likely, chances for or against, degrees of confidence. All of 
them apply to a particular statement which is under discussion. 

_ The same general ideas are often expressed in another way. A 
situation is considered in which the outcome is ‘uncertain’. For 
example, some observations may be made on the movement of equity 
prices during a week. In a set of observations there is a range of 
‘outcomes’, e.g. prices rise or fall by various amounts on different days. 
In one observation there is only one ‘outcome’. If we do not know it, 


E2 A.B.M. 


124 STATEMENTS AND PROBABILITY [δ 


what is the chance that it is this or that? For the equity market on the 
current day, what is the chance of a price rise? Recent observations 
may assist us, but we may be reduced to saying that a rise and a fall 
are ‘equally likely’; we assess the uncertain outcome at a chance of 
4 for a rise. Other rather ill-defined terms tend to appear: doubt, 
uncertainty, need for confirmation. They apply to outcomes in a 
particular situation. | 

The second formulation places the emphasis on the uncertainty 
of events, rather than on a statement which may or may not be true. 
But, however we look at the matter, the basic concept is a statement. 
We state a proposition: equity prices will rise; the chances are then 
assessed. that the proposition is true. 

To fix ideas consider two simple examples: 

(i) This is a familiar, if rather artificial, kind of experiment for 
illustrating probabilities: two dice are thrown and the sum n=n, +n, 
of the two digits appearing is written down. We look for 7 or 11. 
How likely is the proposition that we get »=7 or 11? The first task 
is to specify the set of all possible outcomes. If we look at the digits 
n, and n, separately, there are 36 pairs: 


nee 


Case ny nN, n | Case ny ng n | Case ny ng n | Case n, ng n 


1141 2 10 2 4 6 19 4 1 5 28 5 4 9 
212 3 ll 2 5 7 20 4 2 6 29 5 510 
3.13 4 12 2 6 8 21 4 3 7 30 5 611 
4 1 4 6 13 3 1 4 22 4 4 8 31 6 1 7 
5 1 5 6 14 3 2 5 23 4 5 9 32 6 2 8 
616 7 14 ὃ 3 6 24 4 610 33 6 3 9 
7 2 1 ὃ 1663 4 7 25 5 1 6 34 6 4 10 
8 2 2 4 17 3 5 8 26 5 2 7 35 6 511 
9 2 3 5 18 3 6 9 27 5 3 8 36 6 6 12 


We may decide that the dice are without bias in the sense that all 36 
outcomes are ‘equally likely’. Six of them produce 7 =7 (cases 6, 11, 
16, 21, 26, 31) and two of them give n=11 (cases 30, 35). Together, 
the proposition n=7 or 11 is true in 8 cases out of 36. It is then an 
easy step to assess the chance that the proposition is true at 8 : 28 
or 8/36 =2/9. 

To illustrate, as in the example of the triangle in 5.1, that the 
mode of specification of the logical possibilities is not unique, define 


4] STATEMENTS AND PROBABILITY 125 


outcomes solely in terms of the sum 7 obtained at a throw. There are 
11 of them: 


Sum mn |{|2 3 45 6 7 8 9 10 ll 19 


Chance μ 


outof3e) 11 2345654 3 2 1 


Even if there is no bias in the dice, we decide that the outcomes are 
not ‘equally likely’ and we attach assessments of the relative likeli- 
hood of one outcome as opposed to another. One set of assessments 
μ, adding to 36, is shown above. This is, in fact, obtained by going 
through the same process as before (e.g. n =7 arises in 6 out of 36 
cases); but it may be got by some other procedure (e.g. by experi- 
ment). The point is that something must be assumed: equal chances 
for the 36 outcomes, unequal chances as shown for the 11 outcomes. 

(ii) The heights of a group of 100 twelve-year-old boys are re- 
corded. We are interested in tall boys (over 65 inches). How likely is 
the proposition that one selected boy is tall? The specification of all 
possible outcomes raises difficulties. In a narrow sense, there are 100, 
the actual records. But this is not the problem really considered; the 
records are a ‘sample’ in some sense from a wider range of possible 
outcomes. Suppose we can set limits to the height x inches, e.g. 
50<2<70. Conceptually, x is any real number in this range, a non- 
countable infinity of outcomes. In practice, x is certainly a rational 
number, usually one with 10, 100, 1000, ... in the numerator accord- 
ing to the measuring equipment used, e.g. 100 if heights are read to 
the second decimal place. Then x=7/100 and, for 50<2<70, n is 
any integer from 5000 to 7000 inclusive, i.e. x can take a finite but 
large (here 2001) number of values. There are too many to handle and 
some rounding is needed in specifying the outcomes, as in the 
following approximation: 


Height x 52 54 56 58 60 62 64 66 68 70 


Chance p , 
| (out of 100) 1 1 3 11 26 32 16 7 2 1 


Here height is specified in ranges of 2 inches and the interpretation 
(e.g.) of the first case is: 52+1, i.e. over 51 but not over 53. The 
question of assessing the chance of getting each x (here reduced to 


126 STATEMENTS AND PROBABILITY [5 


10 cases) still arises. This may be done in various ways, from 
theoretical considerations or from empirical distributions of heights. 
The set of assessments » in the table is illustrative; it gives the 
chance of a tall boy as 1/10, i.e. 7+2+1=10 chances out of 100. 
Probability theory is an attempt to quantify the idea of the chance 
of a statement being true and to devise an algebra to handle the 
chances. The problem is of a kind lending itself to a variety of 
treatments and, indeed, to controversy. One approach is suggested 
here: to switch from properties of statements to the corresponding 
properties of their truth sets. The relation between them is given in 
5.2. If the statement a, has truth set A, and if a, has truth set A,, 


then: 


Statement | καὶ AV ας A, Ade AA, 


Truth set A,’ A,VA, <A,NA, AyUA, 


What we must do is to attach a measure w(A) to a set A, and then to 
equate to the probability P (a) of the statement a with truth set A. 


5.5. Probability measure. The definition of the set measure 
u.(A)=P(a) must be such that certain requirements are satisfied, 
those set by the everyday use of probability. The chance of throwing 
a 5 in one throw of a die is 2, if the die has no bias; the chance of a 
6 is the same. What is the chance of getting a 5 or a 6 in one throw? 
The answer we expect is 4+%4=43. What is the chance of getting a 6 
followed by ἃ 6 in two throws? We expect ἔχ § =3¢- Similarly, if we 
ignore the fact that bookies work for profit, we interpret odds of 
3 to 1 against a horse in one race as: chance of horse winning is 2. 
For a horse in another race, odds of 5 to 3 against represent a chance 
of winning of 3. What is the chance that either one horse wins, that 
both horses win? We expect $+2=% (or 5 to 3 on) for either; 
1 xy 3—.% (29 to 3 against) for both. 

Such requirements can be translated into properties of the proba- 
bility P (a) of a statement a, matched by properties of the measure 
(A) of the corresponding truth set. With a little care, and leaving 
only one matter open, we state the object of the exercise of defining 
p.(A)= P(A) in the following terms: 


5] STATEMENTS AND PROBABILITY 127 


Probability measure P (a) Set measure (A) 


(i) P (a) =0 if and only if a is (A) =0 if and only if A=¢ 
logically false 
P (a) =1 if and only if a is (A) =1 if and only if d=U 
logically true 
0<P(a)<l any a O0<p(A)<l1 any A, dS ACU 
(ii) P (a, V ας) =P (a,) + P (a2) μί(4.. 44) =p (A,) +4 (AQ) 
if a, and a, are exclusive if A,NA,= 
(a, Aa, logically false) (A, and A, disjoint) 
(iii) P (a, A 2) =P (a,) x P (ay) p(AyAg) =p(Ay) x (AQ) 
if a, and a, are independent for certain types of sets A, and 
(in some way to be defined) A, | 


Requirement (i) is that a probability lies between 0 (logically false) 
and 1 (logically true). Requirement (ii) is the additive or disjunctive 
property of exclusive or non-overlapping statements, i.e. the chance 
of either is the sum of the separate chances. Requirement (iii) is the 
multiplicative or conjunctive property for statements which do not 
depend on each other (in a way to be made precise), i.e. the chance 
of both is the product of the separate chances. 

The question of a suitable definition of 4(A)=P (a) is pursued 
here only in a simple case: the set U of all logical possibilities (out- 
comes) is finite. The reason for the simplicity is easily given in 
general terms. If there is only a finite number 7 of discrete outcomes, 
then we can if we wish say that they are equally probable. In terms 
of sets, we can assert: | 

.(A,)=p (constant, allr) if A,is the truth set of the rth outcome. 


The n sets A, (r=1, 2, ... n) have U as their union: 
A,VA,v... VA, =U. 
Then, by the additive property (ii), extended to several sets: 
H(A) +p (Ag) + ... +4 (An) =e (U)=1. 
So: | nmu=1 ie. p= : : 
This makes good sense: if there are ἡ) equi-probable outcomes, the 


chance of one of them is Ξ It is illustrated by example (i) of 5.4. 


128 STATEMENTS AND PROBABILITY [5 


The case we do not consider further arises when U is countably or 
non-countably infinite, i.e. an infinite number of possible outcomes. 
Any attempt to define set measure and probability in this case takes 
us too far into measure theory. That the case is by no means un- 
important is seen by reference to example (ii) of 5.4. The universal 
set U is here of the form (x,<a<2,) where x inches is height (limits 
of x, and x, being set) and where z is a real number. The kind of 
probability measure sought is then P(x>65); the set for which 
x>65 is non-countably infinite like U itself. This is the situation 
considered in theoretical statistics which deals with a ‘random 
variable’ x subject to a defined probability distribution.* The diffi- 
culty when U is infinite is that we can not assign a non-zero measure 
to each outcome if we say that they are equi-probable. 

The procedure for defining probability measure in a finite set of 
outcomes draws upon the results of 4.6 for counting sets. If there are 
n outcomes, assign the weight μ, to the rth outcome (r= 1, 2, 3, ... 2) 
in such a way that Σὶμ,-- 1. This is a matter of appropriate assumption 

r 


in each case. A statement a has a finite truth set A, a subset of the 
set U of all outcomes. Let the number of elements in A be n(A) 
where 0<n(A)<n. Add the weights attached to the elements of 4 
and define as the measure of A, i.e. μ(4) = br where Σ extends over 


the n(A) elements of A. Finally, the probability P(a) of the state- 

ment a is (A). 

᾿ Derrytrion: The measure of a subset A of a finite set of possible 

outcomes with weights μι(Σ μ,.-:1, r=1, 2, 8, ... n) is p(A) =u, where 
r A 

> is summation over elements of A and the probability of the statement 

A 


a with truth set A is P(a)=p(A). 


Properties (i) and (ii) are then satisfied. If a is logically false, A=d¢ 
and 4(A)=0; if a is logically true, A=U and w(A)=)u,=1. For 


any A, dH >0 and Lordi, = 1, So 0<p(A)<1, as required. 
r 


Further, if A, and A, are disjoint, then (as in 4.6) the number of 
elements in A, and the number of elements in A, add to the number 
of elements in A,UA,. So do the weights μ, and | 
p-(A,VA,) =p (Aj) +4 (A,). 
* See Goldberg: Probability, An Introduction (Prentice-Hall, 1959). 


5, 6] STATEMENTS AND PROBABILITY 129 


The interpretation of property (iii) is left over. 
In the particular case of equi-probable measure: 


μρι.τξ μ (r=1, 2, Deass n). 


| 1 
Then: 1 =Vp,= yp =Np 1.6. b= ᾿ 
n(A) 


and for a set A of n(A) elements: μ (A)= dp =n (A) = a ae Hence: 
| 4 | 


THEOREM: If n outcomes are equi-probable, then the probability of a 


statement a is P (a) Ἔνι where n(A) is the number of outcomes for 


which a is true. 


As an illustration, consider example (i) of 5.4: throws of two dice, 
assumed to be without bias. There are 36 equi-probable outcomes 
and pu,= 1/36 (all r). For the statement a that the sum of the digits is 
7 or 11, the truth set consists of 8 outcomes: P (a) =8/36=2/9. On 
the other hand, specify 11 outcomes with μ, (r=1, 2, ... 11) as given 
in 5.4. This is not an equi-probable specification. The statement a 
has truth set A: the 6th outcome (u,= 6/36) and the 10th (u,= 2/36). 
Hence p (A) = 6/36 + 2/36 =2/9 and P(a)=2/9 again. 


5.6. Properties of probability measure. The statements a, and a, are 
exclusive if both cannot be true together, i.e. if a, Aa, 18 logically 
false and P(a, Aa,)=0. The corresponding truth sets are disjoint: 
A,nA,=¢. This is the case already discussed, giving 
P (a, ν a,) = P (a,) + P (a,). 
However, if a, and a, are not exclusive, then P(a,Aa,)40 and 
A,nNA,4¢, ie. A, and A, overlap. The number of elements in 
(A,UA,) is then given by the result of 4.6, 1.6. 
n(A,UA,)=n(A,)+2(A,) —0(A, OA.) 
and similarly for the weights μ, so that 
p-(A,VA,) =p (A) Ἐμ(Ά4) — (A, 04) 
1.6. P (a, ν ας) = ᾿ (a,) + P (aq) -- P (Ay ΔΛ 4) .«ὐνννννννννον (1) 
Consider the statement: a, given a,. As a statement, it has a 
probability — the probability of a, given ας. There is a different 


130 STATEMENTS AND PROBABILITY [5 


statement: a, given a, and soa different probability — the probability 
of ας given a,. So: 


DEFINITION: The conditional probability P (a, | a.) is the probability 
of a, given ας; similarly P (a, | a,) is the probability of ας given ay. 

The truth set of a, given a,is A,7A., but this must be related, not 
to the universal set U, but to the set A, since a, is given. Hence to 
get P(a,|a,) the measure 4(A,7A,) must be related to p(A,). 
Strictly: a new universal set A, is taken and the weights of the 
elements in A, re-scaled to add to unity. The original weights are 


μ, adding to Yy,=u(A,). They are re-scaled to —4* adding to 
Ay μ(4.) 


» aD = "ΑΔ Σ i P (a, | de) is the (re-scaled) measure of 
2 2 2 


A,NA,: 
! 1 (4... 4.4) 
Pla = (Oe, sass — B\4ty 2) 
(α; | 42) p> p (Ag) μί 44) ΡΣ ἃ μ(4.) 
ἐς τ A,nA,) 
Similarly : P(a,|a,)= © (4104, ; 
y ( 4] 1) u(A,) 


Since 4(A,)=P(a,), u(A,)=P(a,) and p-(A,NA,)=P(a,Aa,), we 
have: 
P (a, A a2) =P (a, | ag) P (ας) =P (aq | 4) P (a4) ....«νννννος (2) 
Suppose a, and a, are such that P(a, | a.) =P (a,). Then, by (2): 
P (a, | d2)P (α9) 
P (a) 
Hence, it is true both that P (a, | ας) =P (a,) and that P (ας | a,) =P (a,). 
In other words, each statement is irrelevant to the other. In this case, 
the statements a, and a, are said to be independent. Result (2) then 
gives: 
P (a, Aa,)=P(a,)P(a,) if a, and a, are independent ...... (3) 


P(a,|a@,) = 


= P(a,). 


Just as exclusive statements allow straight addition of probabilities, 
80 independent statements allow straight multiplication of proba- 
bilities. 

An assembly of the results obtained, and specifically the results 
(1), (2) and (3) just established, gives the properties of probability 
measure: 


6] STATEMENTS AND PROBABILITY 131 


THEOREM 
(i) Bounds. 0<P(a)<l for any statement a 
and P(~a)=1-P(a) 


(ii) Disjunction. P (a, ν a,)=P(a,) + P (a,) — P (a, Λ ας) 
for any statements a, and ag. 
In particular: P (a, Vv a,)=P(a,)+P(a,) if a, and a, are ex- 
clusive. 


(iii) Conjunction. P (a, Aa,)=P (a, | a,)P (a,) =P (a, | a,)P (a,) 
for any statements a, and ας. 
In particular: P (a, Aad,)=P(a,)P (a) tf a, and ας are in- 
dependent. 


Note that P(~a)=1-P(a) follows from the particular case of (ii) 
since (~a) and a are exclusive and such that ~a v a is logically true. 
Hence: 


P(~a)+P(a)=P(~ava)=1 1.6. P(~a)=1-P(a). 


The additive result (ii) means that the chance of ether statement is 
the sum of the separate chances if and only if the statements are 
exclusive. Otherwise, the chance of both statements must be allowed 
for. The multiplicative result (iii) means that the chance of both 
statements is the product of the separate chances if and only if the 
statements are independent. Otherwise the conditional probability 
of one statement given the other is involved. 

Property (ii) extends in an obvious way to the case of three or 
more statements. If they are not exclusive, the difficulty of allowing 
for overlap gets greater as more statements are taken together, as 
for counting sets in 4.6. If they are exclusive, the probabilities of the 
separate statements simply add. Moreover, statements a, ας, dz, ... 
are exclusive and exhaustive if a, Λα; is logically false (ἐπε }) and if 
ιν A,V az... 18 logically true. Then: 


P(a,)+ P(a,.)+ P(a3)+ ...=1. 


Notice that, if a, ας, a3 ... are not exclusive (whether they exhaust all 
outcomes or not), it is perfectly possible that: 


P(a,)+P(a,.)+P(a,)+ ...>1 for some ay, do, dz, .... 


Property (iii) can be developed to give a result of basic importance 


132 STATEMENTS AND PROBABILITY [5 


in the problem of inference. Consider the conjunction of a statement 
a, with another statement a. Then (iii) gives: 


Ῥία; (@)P QP ἃ (a) P G2): νος νοι υςπιννῶ (4) 


Now suppose a set @1, @2, ας, ... of exclusive and exhaustive statements 
is specified, given a, so that ΣῚΡ (α, | α)--1. In other words, if a is 
r 


known to be true, then one of a,, ας, a3... is true. The question is: 
which one? The result (4) can be written for each r (r=1, 2, 3, ...) 
and added: 


P()YP (a, |a)=YP(a|a,)P(a,) ie. P(a)=YP(a|a,)P (ay) 
Write A= P (a), not depending on a,. Then (4) becomes: 


P(a,|a) = ; P(a|a,)P(a,)  (r=1, 2, 3, ...).cccccceeees (5) 
The result (5), known as Bayes’ Theorem after Bayes (d. 1761), 
states that the posterior probability P (a, | a) is proportional to the 
prior probability P(a,), multiplied by the chance of the known a on 
a,, P(a|a,). For any one statement a,, its chance with nothing known 
(i.e. the prior probability) is compared with its chance when the 
evidence ὦ is known (i.e. the posterior probability). In passing from 
one to the other, the factor which comes in is P (a | a,), the chance of 
the evidence a on the basis of the particular a,. This factor varies 
from one a, to another. The idea is to select that a, which has the 
greatest posterior probability, i.e. the greatest P (a | a,)P (a,). For this, 
the prior probabilities P (a,) must be known. 

So far, so good. Bayes’ Theorem is not subject to criticism; it is an 
established property of conditional probabilities. The difficulty is 
that the prior probabilities P(a,) are not in fact known. A second 
step is now taken: assume all P(a,) are equal to a constant μ. 
This is Bayes’ Postulate, i.e. when nothing is known about prior 
probabilities P(a,), assume them equal. Then (5) is: 


P(a,|a) = © P(a|a,). 
So, to select the greatest P(a,|a), we select a, with the greatest 
P(a|a,). On Bayes’ Postulate, the selection of the particular a, is 
solely on the basis of P(a|a,), the chance of getting the known a 
from a,. This is one of the most controversial matters in the theory 


6, 7] STATEMENTS AND PROBABILITY | 133 


of inference: the selection of a, from a set of possible outcomes, on the 
evidence of a. 


5.7. Examples of probability measure. The simplest and most familiar 
examples of probability (finite number of outcomes) are taken from 
such experiments as dice-throwing and card-dealing. To illustrate 
exclusive and non-exclusive statements, examine the digit (1, 2, 3, 4, 
5 or 6) shown in the throw of a die. The two statements: 


a,: digit 1, 2 οὐ 8 a,: digit 4, 5 or 6 
have each the probability ἢ if the die has no bias. They are exclusive 
and exhaustive; the probabilities add: 4+4=1. But consider the 
two statements: 

δι: digit 1, 2,3 0r4 ὅς: digit 3, 4, 5 or 6. 
Kach has the probability 3, so that 
P(b,)+P(b.)=3+3=9>1. 
This is so because 6, and 6, are not exclusive. To get P(b, v 5,), use (ii): 
P (by V 62) = P (by) + P (bz) — P (by Abe) 


since ὃ, Ab, is digit 3 or 4 and P(b, Ab,) =}. 

To illustrate independent and non-independent statements, take 
a pack of 52 playing cards (with no bias), shuffle and deal two cards. 
What is the chance P of two aces? If the first card is replaced before 
the second is dealt (after a re-shuffle), the chance of an ace at each 
deal is τς. The two deals are independent, neither being affected by 
the other. By (iii): 

P=73 7s=1¢5_ (168 to 1 against). 

If the first card is not replaced, the chance of an ace at the second 
deal depends on whether the first card was an ace or not; the two 
cases giving chances of εἶ and εἶ respectively. The two deals are not 


independent. By (iii): 
P= P(a,)P (aq | a) 
where a, is the statement: ace on first deal, and a, | a, the statement: 


ace on the second deal given an ace on the first. So: 


P=73 ¢r=z727 (220 to 1 against). 


This is a smaller chance (longer odds against) than before. 


134 STATEMENTS AND PROBABILITY [5 


It is to be noticed how the everyday concept of odds for or against | 
fits the probability or chance measure as a proper fraction: 


Odds: p to q on statement a, equivalent to chance P (a) = ae 
p tog against statement a, equivalent to chance P (a) = ree 


A fair bet then corresponds to evens (1 : 1), 1.6. P(a) =. 

Such examples seem to be intuitively obvious. The question may 
be raised: why develop a complicated algebra of probabilities if the 
results are so obvious? It is always necessary (to the mathematician) 
to establish strictly what may appear obvious. However, there is 
more to probability theory than this, even when the number of 
outcomes is finite. There are results which are not obvious, where 
intuitition can lead us seriously astray. Consider the following 
question.* 

There have been 33 Presidents of the United States from Washing- 
ton to Eisenhower inclusive. What is the chance that at least two of 
them have the same birthday, i.e. the same day and month? Most 
people would guess a rather small chance. The theory of probability 
shows that the chance is rather greater than 3, i.e. 3 to 1 on. In fact, 
the statement that at least two Presidents have the same birthday 
is true; Polk and Harding were both born on 2 November. 

To prove the result more generally, consider a group of r people. 
Assume that, for each, the 365 days in the year are equally likely as 
birthdays and ignore leap years. The first person has a particular 
birthday; the chance that the second person has a different birthday 
is $84, the chance that the third has still a different birthday is 343, 
and so on. The chance of r different birthdays: 


364 363 365-r+1  364x363x ... (866 -- γ-Ὁ 1) 
365 365 °° 365 το 3657-1 


is an application of (111) in the non-independent case. If this proposi- 
tion is called a, the proposition that at least two of the r have the same 
birthday is (~a). Then P(~a)=1-—- P(a) gives: 


364 x 363 x... x (365 —7 +1) 


si aaa 3657-1 


* Posed and analysed in Kemeny, Snell and Thompson: Introduction to Finite 
Mathematics (Prentice-Hall, 1957). 


7, 8] STATEMENTS AND PROBABILITY 135 


for the chance that at least two of r have the same birthday. If 
r=33, P,=? approximately. If we ask what size group makes the 
bet fair, we seek r for P,=%. It is found that r=23 (P,=0°51). 
Hence, in betting on at least two people having the same birthday, 
the bet pays off for a group as small as 23 people. Most intuitive 
gamblers would be willing to bet against for groups larger than 23. 


5.8. Finite stochastic processes. Consider a finite sequence of trials 
(or events, or experiments) at each of which there is a finite number 
of possible outcomes. If the outcomes at each stage of the sequence 
depend on chance, in a way to be specified in each case, then the 
sequence is called a finite stochastic process.* We assume no more 
than the following: at a particular trial (the rth out of n) we are given 
the results of all previous trials and we then know both the list of 
possible outcomes a, ας, a3... and their probabilities P(a,), P(a,), 
P(az),..., where P(a,)+P(a,)+P(a,)+ ...=1. The question to 
which an answer is required is: at the end of the process, what is the 
probability of a specified outcome, by whatever route it is reached? 

A general formulation of the problem, while not difficult, involves 
a complicated and messy notation. Moreover, it results in little gain 
as compared with an analysis of each particular stochastic process 
as it is framed. In the following simple examples, the stochastic 
process in each case is represented by a TJ'ree Diagram, a useful 
device for all classification purposes.t 

(i) Six £ notes and six $ bills are distributed between two boxes: 
box A has three £ notes and one $ bill, 


box B three £ notes and five $ bills. TRIAL1 TRIAL2 
You select a box at random (by tossing BOX NOTE CHANCE 
. . 3 
a coin) and you are then permitted to ee fi χϑε 
select at random one note or bill from 2 ὍΣ ae ᾿ ΠΕ 
the box. What is the chance that you ς, κι 4 sae δός 
3 
get a £ note? Since there are six of es tes 1x2=3 
each, you may guess that you have an z Β ἐπε ἢ 
5 τῷ 2X36 
even chance, but you would be wrong. 
The tree diagram of Fig. 5.8a shows Fic. 5. 8a 


* ‘Stochastic’ is derived from the Greek: stochos = guess. Roughly, it is an adjective 
corresponding to probability as a noun. The adjective is not ‘probable’ but rather 
‘probabilistic’ if such an awkward word could be accepted. 

t For other simple examples, see Kemeny, Snell and Thompson, op, ci, 


186 STATEMENTS AND PROBABILITY [5 


the two trials involved. At trial 1 the outcomes are A and B, the prob- 
abilities being 3 for each, shown on the branches leading to A and B. 
At trial 2, if A is the first outcome, then the outcomes are £ note 
and $ bill with probabilities of 2 and ¢ respectively, shown on the 
branches leading from A. If B is the first outcome, the same two out- 
comes are possible but with probabilities ὃ and 3. Conditional proba- 
bility, property (iii) of 5.6, then gives the chance of a final outcome. 
The chance of getting a £ note from box A is: 
P (b, Aa) =P (a,)P (ὃ. | a,) =1/2 x 3/4=3/8 

where a, is ‘box A selected at trial 1’ and where 0, is ‘£ note selected 
at trial 2’. The other three chances are similarly obtained. These 
outcomes are exclusive and their chances can be added. If a, is ‘box 
B selected at trial 1’ and ὃς is ‘$ bill selected at trial 2’, then the 
chances of getting a £ note or a $ bill by either route are: 


P (6,)=P (b, Aa,) + P(b, Aa.) =3/8 + 3/16 =9/16 
P (62) =P (ὃς Aa,) + (ὃς AG,) = 1/8 + 5/16 =7/16. 
You have a better-than-even chance of getting the £ note; it is 9 to 


7 on. 
(ii) Three players A, B and C are 


OME WENNER CHANCE, sat the dennis-club with darkiess 


“-"-- = 3X25 descending and time for only one 

4 eer dytut match. Lots are drawn to decide 

; : * * * on the odd man out, to umpire the 

Std ΝΟΥ͂Σ tA 3%4"% match played by the other two. B 
ic 1x11 and Οὐ are evenly matched, each 

2 3 ee having an even chance of winning. 

: — . 3*4-4 A is better, with a 75 : 25 chance 
a: 1x1-1 of beating B or C. What chance 

ede has A of winning? The tree diagram 


of Fig. 5.8b shows the chances of 
the final outcome (by either route in each case): 


Probability : 4 3 1 


A fair bet needs to be laid with the odds: evens on A, 3 to 1 azamnbt 
B and 3 to 1 against C. 


8] STATEMENTS AND PROBABILITY 137 


(iii) You drive up to a road intersection O knowing that the place 
you want (W) is a short distance (but out of sight) along one of the 
three roads you see. The other roads lead to undesired spots A and B. 
You do not know which road you want and you draw lots to decide 
which you take (trial 1). If you decide wrongly, you return to O and 
draw lots again to decide between the other two roads (trial 2). If you 
again decide wrongly, you take the last remaining road (trial 3). The 
tree diagram of this stochastic process is Fig. 5.8c. In the end, the 


TRIAL1 TRIAL2 TRIAL 3 CHANCE 


4.4ὼ.,...1 
@) ἐκέκιὶ 


κε: 
ω}- 
x 
nN) 
} 
aj 


wf 


@ 


Fia. 5.8¢ 


chances of making 1, 2 or 3 journeys before getting to W turn out to 
be equal, 4 each. | 

One more general type of finite stochastic process is of considerable 
importance. The process is a sequence of » trials, each being in- 
dependent and having two outcomes with probabilities Ὁ and q 
(p+q=1). If n=3, and if one outcome is marked W (for win or 
success) and the other L (for loss or failure), then the tree diagram of 
Fig. 5.8d gives the final outcomes, according to the number of wins 
(out of 3): 


No. wins | 0 1 2 3 


—_ 


Chance | qg 3pq ὃρξᾳ p? 


Similar results can be obtained for n=4,5,... and they generalise 
as follows (5.9 Ex. 26). In n trials, write P(r) for the probability of 
exactly r wins (and Ὁ -- losses), in whatever order they come. Then 


P(r)=(")prgr-+ 


138 STATEMENTS AND PROBABILITY [5 


! 
where we =——~___ (0.<r<n) and (o)= (") =1 (see 1.7 above). 


ri(n-r)! 


TRIAL 2 CHANCE 
TRIAL 1 TRIAL 3 


3 

_f—w p 
TOL py 
_2e W p’q 

TL pg’ 

_2-wW "4 
> 
..:.- 


Fia. 5.8d 


We obtain the probabilities P(0), P(1), P(2), ... P(n) for r=0, 1, 2, 
..n, called the Binomial Distribution. The sum of the probabilities 
of these exclusive and exhaustive results is 1; this can be checked: 


Σ, P(r)=art+ (7) par + (2) γλν ve +p" 


=(q+p)" by the Binomial Theorem* 
=1 since p+q=l. 


For given n, there are (n + 1) values of (*) as r ranges 0,1, 2,... ἢ 
Display these values in a row and write successive rows for successive 
values of the given n (n=1, 2, 3, ...). The values of - ) then appear 


very conveniently in a triangular form, named after Pascal (1623- 
1662). The construction of Pascal’s triangle follows from the result 
(5.9 Ex. 27): 


* The Binomial Theorem of elementary algebra is usually written: 


(l+a)"=1+ ()» + ()* Hee $2", 


Here put = p/g and multiply through by g”. 


8, 9] STATEMENTS AND PROBABILITY 139 


Pascal’s Triangle for (") 


r=-0 1 2 3 465 6... 
n=1 11 
2 12 1 
3 13 3 1 
4 14 6 4 1 
5 1 5 10 10 5 1 
6 1 6 15 20 15 61 


| ef#e  { = $## OF CMF PHBH HOFEHRSHSHHFHeHReORHHRECHRAE 


Each entry is the sum of two entries in the row above, i.e. one 
immediately above and one above and to the left. 


Pascal’s triangle provides the quickest way of finding (" for 


various n and r, when n and r are small positive integers. In the 
example of the Binomial Distribution (n =3), the coefficients of the 


various powers of p and 4 are (") for r=0, 1, 2, 3. These are read off 


Pascal’s triangle as 1, 3, 3, 1. 


5.9. Exercises 


1. For p: equity prices are high and 4: all equity prices are rising, translate 
PN~QU ~PA σῷ, ~(pVq) and ~(~p Vv ~g) into words. 

2. Consider the statements about adults: p: the person is a man, 4: the 
person is currently married and r: the person has been married at some time. 
Express the following by use of ~, V, A and -: (a) she is a woman, (δ) the 
person is either married now or has been at some time, (c) the person is 
widowed or divorced, (d) the woman is widowed or divorced, (e) if he is 
married now, then he has been married at some time. 

3. In Ex. 2, show that ~pA ~gq refers to a single, widowed or divorced 
woman but ~pA ~* to a single woman. 

4, In example (ii) of 5.1, show that there are three logical possibilities for 
rising prices as in the Truth Table: 


Hiquity prices q r ~q ~QAr “ΝΥ 
all rising T T 
some, not all, rising F T 
none rising F F 


Complete the table. Deduce that ~qvr is logically true, but not ~qar. 
Express in words, 


140 STATEMENTS AND PROBABILITY [5 


5. In Ex. 4, show that ἀν (~qA7) is equivalent to 7. 

6. Make up a Truth Table for the six logical possibilities (man/woman; 
single/married/widowed or divorced) of Ex. 2 and show in it the truth or 
otherwise of ~pAq, αν} and q->r. Show that qg implies r and that qvr is 
equivalent to r. 

*7. The Notation v . Distinguish between ‘p or g or both’ and ‘p or q but not 
both’. Show that, while the first is »Vq, the second needs to be shown 
~(PAQ)A(pVq), sometimes written pv gq. Illustrate the difference with 
reference to p and q of example (i) of 5.1. For example (iii) of 5.1 show 
that p Vq is equivalent to “2 Δ φ. Interpret in words and in terms of truth 
sets. 

8. If p and 4 are any statements, show that » Vv ~p is logically true, and 
equally p V(~p Ν 4). In terms of truth sets, show PUP’ =U and that 


PU(P’UQ) =(P UP) U9 =U VQ =U. 


In what sense is g here a red herring? 

9. The truth set of pq is P’UQ. Deduce that the truth set of peg is 
(PQ) O(P VQ’) =(PNQ) U(P’ NQ’) and illustrate on a Venn Diagram. 
Show that, if the truth set of poog is U, then P =Q (p and q equivalent). 

10. Draw Venn Diagrams to show that p>q, ~p Vq and ~q-> ~p have the 
same truth set. Interpret. 

11. Given: the product of an even digit τ with any digit is even. Prove by 
reductio ad absurdum that m and n are both odd if mn is odd. 

12. Show that (pA ~g)>~>p is equivalent to pq. Devise a method of 
proof of p—-q which starts by taking p in conjunction with ~q. 

13. Show that the assertion is valid: if p vq and ~g, then p. Illustrate by 
truth sets. 

14. Consider the argument: everything medicinal is vile, therefore every- 
thing vile is medicinal. If the first is true, show that a necessary condition for 
something to be medicinal is that it is vile; and that a sufficient condition for 
something to be vile is that it is medicinal. 

15. Summarise Euclid on congruence by stating three equivalent conditions 
for two congruent triangles (all necessary and sufficient): three sides equal; 
two sides and included angle equal; one side and two angles equal. Show that 
the condition that two angles are equal is necessary and sufficient for similar 
triangles, but necessary only (not sufficient) for congruent triangles. 

*16. Reflexive sets. Use the results of 4.7 and 4.8 to show that a necessary 
condition for a countably infinite set is that it is reflexive but that the Axiom of 
Choice is needed to establish that a necessary and sufficient condition for an 
infinite set is that it is reflexive. 

17. Feeding pennies into a slot machine has the possible outcomes: (a) 
nothing back, (b) penny returned, (6) a prize won, (d) penny returned and prize 
won. Attempt to assign weights to the truth sets A, B, C and D given that the 
probability of a is } and that of ὃ or c (but not both) 5/12. Show that you can 
put the odds as evens on getting something and 11-1 against getting a prize 


9] STATEMENTS AND PROBABILITY 141 


and money back, but that you cannot write the chance of penny returned or 
that of winning a prize. What additional data are needed for this? 

18. It is given that P(a,A ας) =P(~a,) =}; P(a,) =4. What is P(a, Va,)? 

19. In a race (won by one horse only), a bookie quotes 2 : 1 against horse A 
and 5 : 1 against horse B. What odds should he offer on A or B winning? 

20. Consistent statements. Statements a, and a, are said to be consistent if 
P(a,Aa,) #0. From property (ii) of 5.6, establish that, if P(a,)+P(a,)>1, 
then a, and a, are consistent. 

21. There are six empty seats in a row at a cinema; three people take seats 
at random. What is the chance that they leave no empty seats between them? 
What is the chance that they leave three adjacent seats empty? 

22. In example (i) of 5.4, the chance that 11 or 12 is the sum of the two digits 
shown by the dice is 3/36 = 1/12. Show that it is 3/11 given that at least one of 
the digits is 6, and that, if the first throw gives 6, it is improved to 1/3. Why? 

23. A and B are playing poker. A bets and, given his hand (a strong one), 
the chance of B’s hand being better is 1/10. If B has a better hand, the chance 
that he raises the bet is 9/10; if B’s hand is worse, the chance that he raises is 
1/5. B does raise the bet ; use Bayes Theorem of 5.6 to show that the probability 
that B has a better hand is now to be put at 1/3. 

24. Four people leave umbrellas on the hat-stand of a restaurant. Absent- 
mindedly, the first three to leave pick umbrellas at random and the fourth 
takes the last one. Show that the chance that no one gets his own umbrella is 3/8. 

25. The audience at a small political meeting is 10 on one side of the aisle 
(7 Conservative, 3 Labour) and 10 on the other side (5 Conservative, 5 Labour). 
In taking a straw vote, the candidate tosses a coin to decide which side of the 
aisle to go, and then he selects at random two people, one after the other. 
Draw a Tree Diagram to show the outcomes and their chances. Show that the 
chance of getting two Conservatives is 31/90. 

26. Binomial Distribution. For 4 trials, draw a Tree Diagram similar to Fig. 


5.8d and show P(r) =(*) pra" (r=0, 1, 2, 3, 4). By induction, prove that 
P(r)= (*)pran-" for n trials. 

27. Pascal’s Triangle. Establish the construction by showing, from the 
definition of (*), that & “i ἢ - (*) + é ᾿ 1) for any positive integral n and 
ran. 


. ἢ -Υὲ} ee 
28. Show that the ratio of (*) to (, Ἴ 1) is “᾿Ξ for any positive integral 


nand r<n. Deduce that () » (, ἀμ 1) for:f =1 2, sis : (n even). What corre- 
sponds when 7 is odd? Interpret in terms of a row of Pascal’s Triangle. 
n 
29. In the Binomial Theorem for (1 +x)", put ὦ =1 to show that > (*) =2" 
r=0 


for n ἃ positive integer. Check from the rows of Pascal’s Triangle for n = 1 to 6. 


CHAPTER 6 


GROUPS AND FIELDS 


6.1. The structure of a set. In Chapters 2 and 3 we tried to disclose the 
basic concepts of number systems, and of sets of polynomials, and to 
lead towards a precise formulation of them. In such a mathematical 
development, there must be certain assumptions in the form of 
undefined properties and of definitions of terms and concepts. There 
are postulates and definitions to be framed. It is essential that these 
should be consistent, that nothing is in conflict with anything else. 
But it is also desirable that they should be elegantly and economically 
laid out, that they should be as concise, as simple and as general as 
they can be made. In particular, they should be independent in the 
sense that nothing is a consequence of anything else, and minimal in 
that they are reduced to the barest essentials. 

For example, it is quite consistent to define an ordered field as 
something which satisfies all the rules and properties of 2.2. But it 
would be wasteful and not very enlightening to do so. No attention 
is paid to which of the rules and properties are derivable from others, 
nor to the minimal definitions of sets which fall short of being 
ordered fields. It is time to be more systematic, to simplify and to 
generalise concepts. 

Mathematics has to do largely with the structure of a set. How are 
the elements related? Can they be added and multiplied? Is there an 
order among them? If so what are the properties involved? The 
properties of structure to pursue first, and mainly, are those con- 
cerned with binary operations, like +.and x, by means of which one 
element is obtained as a combination of others. For sets of numbers, 
+ and x are ordinary addition and multiplication. For polynomials, 
the operations are a little more involved. In saying that a,x? -Ὁ ὃ. -Ὁ δι 
plus a.2? + b,x +c, isathird polynomial (a, + a,)x? + (6, + ὃ.) + (6..- 64), 
we mean that the sum rule is: the sets of three coefficients (a,, ὃ., 64) 
and (@,, bs, 64) add to the set (a, + ας, 6, +b., cy +C,). Each of the three 


1, 2] | GROUPS AND FIELDS 143 


coefficients is subject to separate addition. Another and less straight- 
forward rule applies to products of polynomials. For subsets of a 
given universal set (4.2) the operations are less familiar; in combining 
subsets we have union ὦ and intersection m. These have some 
similarity with sums and products, by no means exact but enough to 
make it worth while to use the notation + for ὦ and x for nr. 

Binary operations, then, are of various kinds. Addition and multi- 
plication are the most common, either in their ordinary forms or as 
suitable descriptions of operations which are sufficiently similar. 
We must keep an open mind on what operational rules will be 
obeyed; the rules of Boolean algebra (4.3) are no less valid or 
reasonable than those of ordinary algebra (2.2). From this point of 
view, an integral domain (like the integers) or a field (like the 
rationals) is a very specialised kind of set, satisfying a whole list of 
particular, though familiar, rules. Other sets may not be so obedient. 
In examining the structure of sets generally, we find it profitable to 
go back to something simpler, and in particular to consider first a 
single binary operation and not the conjunction of two such opera- 
tions. This single operation may be addition or multiplication, or 
something more or less like one of them. We start from a completely 
neutral position: we take a binary operation denoted *, whatever it 
may be. 


6.2. Groups. A set {a, ὃ, 6, ...} is considered with respect to a binary 
operation +. The operation is a rule of combination applied to any two 
elements a and 6 to give another element written a + b. In a particular 
case, it must be specified precisely, e.g. by writing a table showing all 
combinations, as with the ordinary multiplication table. Most sets 
in mathematics are of a kind called ‘groups’, with a structure of a 
particularly simple but general nature in terms of the specified 
binary operation. This structure is that the set obeys one of the 
columns of the operational rules of 2.2, i.e. the rules in their applica- 
tion to one binary operation. More precisely, there are four rules to 
be obeyed. The first is closure: every pair of elements of the set must 
be capable of combination by the operation + and the result in each 
case is also an element of the set. There are no exceptions and nothing 
new is produced. Another is the associative property than the order 
of combining three elements is immaterial: a « (Ὁ *c)=(a * δ) «©, 


144 GROUPS AND FIELDS [6 


either being written ἃ τ ὃ κα. A third is that there is an «identity 
element 6 in the set, such that any element a is unchanged when 
combined with the identity element: a * e=e * a=a. The last is that 
every element has an inverse in the set, that each a has an inverse 
a1: axa-!=a-!xa=e. When an element and its inverse are 
combined, the identity element results. From these properties, 
another follows: cancellation. If a τ b=a*c, then b=c. Notice that 
nothing has yet been said about the commutative property that the 
order of combination of two elements does not matter: a τ ὃ τοῦ «a. 
This is not laid down as a requirement for a group. It can be regarded 
as an extra, desirable but not essential. Many groups do obey the 
commutative rule; they are ‘commutative groups’. There are other 
᾿ groups which are not so obedient. 

We have not yet arrived at a strict and economical definition of a 
group. While the properties mentioned are all consistent, some of 
them can be deduced from others. This is true of cancellation as 
already indicated. It is also true of part of the rule for the identity ; 
if exa=a, then a*e=a. Similarly for the inverse: if a-1*a=e, 
then a « a-1=e. Finally, it is not necessary to lay down either that 
the identity 6 is unique or that the element a has a unique inverse 
a-1, It is certainly desirable that there should be one and only one 
identity (inverse) but it is not necessary to specify it. The uniqueness 
follows as a consequence of the other rules. (For proofs see 15.3.) 

Hence, as a strictly axiomatic definition: 


DeFrnition: A set G={a, Ὁ, 6, ...} of elements of any specified kind, 
in which a binary operation * ts specified, 1s a group if the four postu- 
lates shown below are satisfied. 


For any elements a, b,c, ... of a group G: 


Property Postulates Operational Rules 
Closure axbeG axbe@ 
Associative a* (b*c)=(a* ὃ) ἘῸ as(b«c)=(a*b)*c=axebec 
Commutative | ...... a*b=b%*a may be true for all 


a, beG (commutative group) or 
it may not (non-commutative 


ase) ae oe 
Identity there is an element e¢G| the identity e is unique and 
such that e * a=a a*xe=—e*xa=a 
Inverse there is an element a-!¢@| the inverse a—! is unique and 
such that a~} κα τὸ axa-t=a1*a=e 
Cancellation | ...... ifa*« b=axc, then b=c 


2] GROUPS AND FIELDS 145 


The operational rules shown comprise the four postulated properties 
with the addition of others which follow from the postulates. They 
are in line with the rules of 2.2. 

From the properties of number systems, it is seen that the integers, 
rationals, real and complex numbers are all groups under addition. 
The set of positive integers is not a group since there is no identity 
(zero) and no inverses (negatives). If the operation considered is 
multiplication, then the rationals, real and complex numbers are all 
groups. However, the set of integers is not a group under multiplica- 
tion; there is an identity (the integer 1) but no other integer has an 
inverse (reciprocal). Similarly, the set F[x] of polynomials is a group 
under addition, but not a group under multiplication since again 
reciprocals are lacking. | 

A group G={a, b,c, ...} has all kinds of subsets. The question is: 
does a subset of G itself satisfy the four postulates for a group. If so, 
it is a group in its own right, and called a subgroup. It is necessary 
that the elements of a subgroup K of a group G satisfy all the four 
postulates of a group. On the other hand, we can get by with less 
since many of the properties of G carry over into the subset. It is 
sufficient to write two conditions only: 


THEOREM: A subset K of a group G'= {a, ὃ, δ, ...} 1s a subgroup under 
the conditions: (a) if a, ὃ ε Καὶ, thenaxbeK; (ὃ) ifaeKkK, thn a eK. 
A subgroup K of G must contain the identity e of G. 


Proof: if K is a subgroup then the four postulated properties hold 
and these include (a) and (δ). On the other hand, if (a) and (Ὁ) hold, 
then K is closed under + by (a) and two applications give: ifa,b,ce K, 
so do a + (ὃ * 6) and (a * δ) * c. These are the same since they are so 
in G itself (associative property for K). Again, if ae K then (b) gives 
a ε K and (a) givesa-!xaeK. But a-1*a=e in Gandsoee K. This, 
with the fact that a-1¢K, shows that the identity and inverse 
properties hold for K. Hence, by this process of checking each 
postulate (property) in turn, it follows that K isa group. Q.E.D. 

It is to be noticed that a set of a single element {e} is always a 
group, for e serves as the identity and its own inverse: 6 * e=e. 
Further, a group @ has at least one subgroup, i.e. the set of one 
element {e}. i 

The following three examples of groups under addition serve to 


146 | GROUPS AND FIELDS [6 


indicate the variety of sets which are groups. There are infinite 
groups, as in example (i), and finite groups as in (ii). There are groups 
of numbers and groups of other kinds of elements as in (iii). In all 
practical cases, groups under addition are commutative and they 
can then be called simply additive groups. When the operation is 
addition, the symbol * used above is replaced by + and the identity 
e by zero (0). The inverses in the group are then the negatives and 
written a-!= — a where ( -—a)+a=0. Since negatives exist, the inverse 
operation of subtraction can be performed: a -b=a+(-6). 

(i) The set J of integers. This is a commutative group under addition 
since all the rules (including the commutative one) are obeyed. The 
identity is zero 0 and the inverse of any integer m is the negative 
(—m). Of the subsets of J, many are not groups, including the subset 
J+ of positive integers which lacks an identity (zero) and inverses. 
The subset of even integers {... —4, —2, 0, 2, 4,...} is a group, a 
subgroup of J. This follows since the conditions (a) and (6) of the 
Theorem above are satisfied; the sum of two even integers and the 
negative of an even integer are even integers. The same is true of the 
subset of J consisting of all multiples of 3, or indeed of any integer. 
See 6.9 Exs. 2 and 12. 

(ii) The set of integers (mod n). This is a commutative group under 
addition. The sum of two or more integers is a member of the set and 
the order of adding is immaterial (associative and commutative). 
The identity is 0, the inverse of 0 is 0, and the inverse of m (#0) is 
n-m, since m+(n—m)=n=0 (mod xn). Consider {0, 1, 2, 3, 4} 
(mod 5) as an instance. In a finite set with a specified (and relatively 
small) number of elements, the table of the operation can be written 
and used directly to demonstrate whether or not the set is a group. 


Here: 


+ 012 3 4 
0 012 3 4 
1 1.2 8 40 
2 23 4 0 1 
3 3.40 1 2 
4 401 2 ὃ 


That 0 is the identity is seen from the (unchanged) elements in the 
row 0. That each element has an inverse (negative) is seen from the 


2] GROUPS AND FIELDS - 47 


fact that 0 appears just once in each row, the corresponding elements 
being each other’s negatives (e.g. 1 and 4; 2 and 3). That the group 
is commutative is seen from the fact that the table is symmetric 
about the leading diagonal. 

(ili) Zhe set S of all subsets of a given universal set U. If the union of 
two subsets is defined (as in 4.2) as all elements in either or both of the 
subsets, then it is seen at once that S is not a group under addition 
(union). From the operational rules for sets, if S={A, B,C, ...}, then: 

A+BeS (closure); A+(B+C)=(A+B)+C (associative) ; 
A+0=A< (identity) 
all hold for union (+), the empty set being written 0 (zero). But there 
is no inverse of any given (non-empty) subset A; since 4+ B=0 
is not possible, there is no subset B to serve as the negative of A. 

However, addition (union) can be re-defined so that the properties" 
of closure, associative and identity still hold and so that the inverse 
property also holds. The addition (union) of 
A and B is taken as the set (4 +B) of all 
elements in A or in B but not in both, as 
shown in the Venn Diagram of Fig. 6.2. The 
figure also shows that the associative property 
is still true, as also are closure and the identity 
(the empty set 0). Further: 

A+A=0. 
Each subset A is its own inverse: (—.A)= 4. | 

The set S={A, B,C, ...} is a group under | 
addition. The peculiar feature is that each iis 
subset A serves as its own negative. Hence, 
subtraction and addition are by definition the ἣ» | 
same: A-B=A+B. 

Of even more interest are groups under A+(B+C)=(A+B)+C 
multiplication. When the operation is the pro- Fra. 6.2 
duct of two elements, the neutral symbol » is 
replaced by x (which can then be dropped when there is no ambi- 
guity) and the identity e becomes unity (1). Sets of numbers may or 
may not be groups under x , as the examples below show, but if they 
are then the commutative property also holds. Groups of numbers 
are commutative groups. The same is not necessarily true of groups 
of other kinds of elements under x, which is why groups are so 


F Α.Β.Μ 


148 GROUPS AND FIELDS [6 


interesting.* The illustration of non-commutative groups is so 
important that it is left for separate discussion in the following two 
sections. Any group under x (commutative or not) is such that 
every element a has an inverse or reciprocal αὶ. But it 1s only for a 
commutative group that the inverse operation of division can be 
uniquely performed: 5 abt =a. 

(iv) The set R of rationals (excluding 0). This is a commutative 
group under multiplication since all the rules, including the com- 
mutative one, are obeyed. Of subsets of R, the set J of all integers is 
an example of one which is not a multiplicative group since it lacks 
reciprocals; J is a group under addition but not under multiplication. 
On the other hand, the set {2"|m an integer} is a group under 
multiplication, but not under addition. 

(v) The set {—1, 1}. This is the simplest of all groups under x, 
apart from the case of a set comprising the element 1 alone. It is a 
group since all the four rules hold, each element being its own re- 
ciprocal; it is also commutative. It is another subgroup of the group 
R. It is also a subset of J, i.e. J itself is not a group under x but does 
have a subset which is a group. 

(vi) The set of integers (mod n), again excluding 0. This is a group 
under x if n is prime but it is not a group otherwise. The general 
result (6.9 Ex. 9) can be illustrated in two cases where the multiplica- 
tion tables, shown here, demonstrate whether or not the group 
properties hold. For ἢ =4, they do not; there is a failure to jump the 


n=4 


first hurdle. The set is not closed since ὃ x 2=0; not an element of the 
set. The set also lacks reciprocals since there is no r for 2 x r=1,1.e. 2 
has no reciprocal. The set of integers (mod 4) is not a group under x 
even when 0 is excluded. For n=5, the set is far more obedient. It is 


* Note that, if G ={a, ὃ, ο, ...} 15 ἃ non-commutative group under x, then a x6= 
δ x ais not generally true, i.e. not true of ali a and ὃ in G. It is true of some a and ὃ and, 
in particular, it is true of the identity (a x 1 =1 xa) and of reciprocals (a x a-! =a-! xa). 


2] GROUPS AND FIELDS 149 


closed, satisfies the associative and commutative rules, has an 
identity 1 and any member has a reciprocal. The reciprocal is to be 
found by seeking an entry 1 in any row of the multiplication table; 
and every row has just one entry 1. The integers (mod 5) with 0 
excluded form a commutative group under multiplication. These 
results are a re-interpretation of those of 2.7 above. 

(vii) The set S of all subsets of a universal set. Consider the operation 
of products (intersection) with the identity element 1 taken as the 
universal set. The operational rules for sets show that the properties 
of a group hold, except that reciprocals are lacking. If A is a proper 
subset of the universal set, there is no subset B such that A x B=1. 
S is not a group under multiplication. 

The group {-—1, 1} corresponds to handling odd and even, writing 
odd =-1, even=1 and taking products. A combination of odd 
and even is odd (—1x1=-—1); other combinations are even 
(-1x ~1=1x1=1). It is also the simplest case of a ‘cyclic’ group: 


DEFInirion: A group G under multiplication ts cyclic tf the elements — 
{a, a7, ... a"—1, a} are generated by powers of a single element a, which 
is such that a” =1 (identity), for some positive integer n. 


Note that the finite set of elements is closed, as it must be for a group: 
asat =astt = qantr — (a")%at = 1’ar =a" EG 


where 8 and # are positive integers (<n), where s+t=qn+r having 
remainder r (<7) and where a” = 1 is the essential property to ensure 
closure. All the other properties of a group are satisfied, including the 
commutative rule. 

Cyclic groups are not as uncommon as might be thought. Consider 
the following cyclic groups of elements from the field of complex 
numbers. Put a= —1, giving the cyclic group {—1, 1}. Put 

a=w=F(—-1+2,/3), 

so that w?=4$(— 1-7/3) andw*?=1, giving the cyclic group {w, w?, w>}. 
Put a=1, giving the cyclic group {i1, —1, --, 1}. The essential 
property in the last case is 14=1, from the definition 72?= — 1. These 
examples are the cyclic groups of the nth roots of unity, for n =2, 3, 4. 
They suggest that the nth roots of unity are a cyclic group for any 
positive integral n. This is so, from the theorem of 3.8. There are also 
cyclic groups under addition (6.9 Ex. 8). 


150 GROUPS AND FIELDS [6 


6.3. Transformations. The object here is to illustrate some of the 
great variety of types of transformations used in diverse branches of 
mathematics and to view them from one aspect: a set of trans- 
formations as a group.* The group operation is multiplication, taken 
as two transformations applied one after the other in succession. 

(i) Translations and magnifications of a figure. Start with a given 
figure, say a square located in a plane (A in Fig. 6.3). Stretch the 
figure in a given direction (horizontally in B of Fig. 6.3) and let the 

stretch be in a given ratio a : 1 (a>0). 


od A This is transformation of the square 

into a rectangle, a magnification in one 

O direction. There are various trans- 
x 


y B formations according to the values of 
ἫΝ a (a>1, magnification; α-1, squeeze; 
| | π΄ @a=1, no change). Alternatively, take 
the square and shift it bodily in a given 
direction (horizontally in C of Fig. 6.3) 
and let the shift be a given distance ὃ. 
This is a transformation of the square 
into another square, a translation in 
one direction. The value of ὃ can vary 
(b>0, shift to right; 6<0, shift to left; 
O b=0, no change). 

Given two transformations, they are 
combined by applying one and then 
the other. Two magnifications, first by ἃ : 1 and then by α : I, 
give a combined magnification of aa: 1. Here the order does not 
matter (commutative). Two shifts, first by 6 and then by 8, give 
a combined shift of 6+f, and again the order does not matter. The 
position is different when a magnification and a shift are combined 
and it is most easily explored in algebraic terms. 

Fix axes Ox and Oy in the plane and units for measurement along 
Ox and Oy. Any point (e.g. the corner of the given figure) is then shown 
by a pair of co-ordinates (x, y). If the transformations are in the 
direction Ox, then they leave y unchanged but change zx into 2’ 
according to the rules: 


© 


te te 
x Ὁ 4 
> Ὦ 1G) 
Qo 


Fre. 6.3 


* Another approach to transformations is followed in 7.5 below. 


9] GROUPS AND FIELDS 151 
(1) Magnification by a : 1 xe’ =a. 
(2) Translation by +) α΄ =u +b. 


Apply (1) and then (2), giving a combined transformation (2) x (1), 
where the order is read from right to left (see 1.7). This is from z, 
through 2’ to x”: 


α' ταὶς and w’=ax'+b give 2x’ =ar+b. 


This is a more general transformation, a magnification and a shift 
together, illustrated in D of Fig. 6.3. Now apply (1) and (2) in reverse 
order: 


αἰ =“4+b and wv’=azx' give x’ =ax+ab. 


Hence the transformation (1) x (2) is not the same as the transforma- 
tion (2) x (1); they are different examples of the general type (mag- 
nification and shift). Transformations (1) and (2) are not commu- 
tative. | 

The transformations are interpreted in terms of moving a figure, 
the axes and units being unchanged. They can be interpreted equally 
well in terms of a fixed figure, the axes and units being changed. The 
magnification by ὦ : 1 is then a change in the unit for measuring x 
(e.g. a=3, changing from yards to feet). The translation by +) is a 
change in the origin for measuring z (e.g. distances in miles Εἰ. of 
London on old scale, E. of New York on new scale). The general 
(magnification and shift) transformation is a change in both unit and 
origin for x; e.g. x’ =32 + 9x/5 changes the measure of a temperature 
from x° C. to (z’)° F. | 

(ii) Movements of a figure. The rotation of a figure can be handled on 
exactly the same lines as in (i), involving a transformation from 
co-ordinates (x, y) to co-ordinates (α΄, y’). (See 6.9 Ex. 15.) Consider, 
however, a rather different geometrical approach, when the figure 
to be moved is regular, e.g. an equilateral triangle, a square, or 
(generally) a regular polygon. The case of the equilateral triangle A 
suffices for illustration. Another case is given in 6.9 Ex. 16. 

Cut A out of cardboard and consider all the ways in which the piece 
can be put back in the hole left in the cardboard. If the original A has 
vertices lettered A, B and C, there are six ways of moving A into 
place, as shown in the table: 


152 GROUPS AND FIELDS [6 


Here: 1— leave A unchanged 
w — rotate anti-clockwise through 120° (A to B, B to C and 
C to A) 
w? — rotate anti-clockwise through 240° (A to C, B to A and 
C' to B) 
p — turn over about the perpendicular from A to BC 
gq — turn over about the perpendicular from B to AC 
r — turn over about the perpendicular from C to AB. 


Each of these is a transformation of A into another position in 
which the vertices are different. The set of six can be called the 
movements of the triangle. 

The first three (1, w and w?) correspond to rotations and they 
combine easily among themselves. In particular: 


A C B 
w followed by w: BCOTCTABTOA™® 


2 


Hence w x w (two successive transformations) is w?, i.e. two rotations 
of 120° gives one rotation of 240°. This is the justification for the use 
of the notation w?, Further: 


A 
ww Ww aye: 5 > —= 
ΤΌΠΟ ΒΘ ΟΥ̓ΔῸΣ: BC AB BC ? 


A B A 
2 . — 
w* followed by w: py > (> pam! 


i.e. this double application is commutative and w x w?=1. A rotation 
of 120° and one of 240° combine to give a rotation of 360°, which 
leaves A unchanged. The three transformations 1, w, w? are com- 
mutative and, clearly, linked with the cube roots of unity in some 
way. 

The other three (p, 4, r) can be combined with themselves or with 
the first three, and the results are non-commutative. For example: 


9, 4] GROUPS AND FIELDS 153 


w followed by p: =. > Pa > a =q 


A A B 
p followed by w: 2 4 > Ὁ Br AaTu” 


The complete set of pairs of transformations can be worked out and 
tabled: 


First transformation : 


wo w p gq Υ͂ 

Second 1 Ι ww p gq rr 
transf.: w wo w Ll r p ῃ 
w |w lw q rr wp 
»» pq r 1 ow a 

q q r p w Low 

r rp gq w ow 1 


(iii) Permutations of a collection of objects. Consider the case of three 
objects, placed in an original order: (A, B, C). There are six per- 
mutations: 

(A, B, C)>p,: (A, B,C) pe: (C, A, B) ps: (B,C, A) 

py: (A, C, B) ps: (C, B, A) pg: (B, A, C). 
Each of these is a transformation of the collection of objects from one 
order to another. They can be combined, e.g. 

Pz followed by p,: (A, B, C)—(C, A, B)—-(C, B, A)=p, 

p, followed by p,: (A, B, C)-(A, C, B)->(B, A, C)=pg. 
Hence two successive permutations is itself a permutation, but the 
process need not be commutative. There is, however, no need to 
pursue further since this set of six transformations is essentially the 
same as the set of six transformations in (ii), with p,=1, p,=o, 
P3=w*, ῬᾳΞΞ;}, Ps=4, Pe=r- It is only a matter of interpretation: the 
movements of the triangle correspond to the permutations of the 
labels (A, B, C) attached to the vertices. 


6.4. Groups of transformations. From the examples given, it is clear 
that transformations can be arranged in sets. There is, from (i) of 
6.3, the infinite set of magnification/shift transformations x’ Ξε αὐ +b 
for various real values of a and ὃ. There is, from (ii) or (iii), the finite 
set of six movements of a triangle, or six permutations of three 


154 GROUPS AND FIELDS [6 


objects. The elements of such a set can be combined, one transforma- 
tion being applied after another. There is, therefore, a binary 
operation defined in the set; it can be called multiplication and 
denoted x, though it is to be interpreted as successive applications 
of the named transformations. The product of any two transforma- 
tions may or may not be commutative. The question naturally arises: 
is the set of transformations a group? 

The range and depth of the concept of a set and group are greatly 
increased. We have met sets and groups of numbers, or of number 
sequences (e.g. polynomials). We now have sets and groups of 
operations or transformations. The elements which compose the sets 
are now something quite different. Wide new vistas are opened up. 
Historically, indeed, the concept of a group was developed first for 
transformations, and later generalised to other entities such as 
numbers or polynomials. We have here simply reversed the historical 
order. 

A definition and a notation are required for transformations of the 
kinds considered (and many others): 


DEFINITION: A transformation is the result of applying a specified 
operator r to certain objects A, B,C,.... 


Noration: A transformation T is written A-—>r(A), where the 
T 


operator r applied to A gives r(A). 
For example, A may be a triangle and r may be the operator ‘rotate 
through 120°’; or A may be a collection of three things in order and 
r may be the operator ‘permute the collection by interchanging the 
second and third things’. The terms ‘operator’ and ‘transformation’ 
are not always kept distinct. It is as well to reserve the term ‘operator’ 
for the rule whereby one object is changed or transformed into another 
object, and the term ‘transformation’ to the whole process of selecting 
A and of getting the transformed object r(A). 

The product of two transformations (operators r and s) is defined: 


Derinition: The product of two transformations is the result of thetr 
successive application. If r is first applied to A and then s applied to the 
result r(A), the product is: s(r(A))=sr(A). 

Here sr(A) is to be interpreted: r first, then s. The other product 
r(s(A)), or more simply rs(A), is to be interpreted: s first, then r. 


4] GROUPS AND FIELDS 155 


The operators are put in front of the object and read from right to 
left.* 

Consider a set of transformations: r(A), (A), ¢(A), ... in which 
products are defined as the double transformations such as sr(A) 
or rs(A). The products need not be commutative: sr(A)rs(A). A 
different transformation may arise from r first, s second, than from 
8 first, ry second. To establish that we have a group of transformations, 
we must check that the four postulates for a group are satisfied. 
The set must be closed: sr(A) must itself belong to the set if r(A) 
and s(A) belong. Products must be associative: ἐ (87) (44) =(ts)r (A), 
i.e, the double operator sr first and ¢ second must give the same result 
as r first and the double operator (ts) second. There must be an 
identity 1, i.e. an operator 1 which leaves the object unchanged and 
which therefore gives r x 1(A)=1xr(A)=r(A). Finally, there must 
be inverse (reciprocal) transformations: each operator r has an 
inverse 71 such that γ7.- (4) --γ-1γ(4)-- 4. The inverse transforma- 
tion undoes what is done by the original transformation. Two 
examples illustrate: 

(i) The group of translations of a figure. The translation ὃ (A) shifts 
the figure A a distance ὃ to the right. If any point in the plane (e.g. 
a corner of the figure A) has co-ordinates (x, y) referred to axes Oxy 
(Oz horizontal), then the transformed point has co-ordinates (x’, y) 
where x’ = + ὃ. This is a single transformation (which may be applied 
to various figures A, B,C, ...) if b is a specified value. A set of trans- 
formations is obtained as ὃ varies. Suppose b =n, a positive, zero or 
negative integer. The set of translations is then n(A), ie. 2’ =x+n, 
as n ranges over the set J of integers. The product of two translations 
is mn(A)=nm(A), where successive shifts of n and of m (or con- 
versely) result in a shift of (m+n). The first shift gives x’ =x +n, the 
second gives x’’=2'+m, and the product of the two shifts gives 
a" =x+(m-+n). Hence, the properties of the set of translations 
n(A) under products are paralleled by the properties of the set J of 
integers under sums. J is a group under addition and so is the set of 
translations n(A) under multiplication. Both are commutative. 

(1) The group of the equilateral triangle. The set is a finite one, 


* There is an alternative notation, sometimes used and giving rise to some confusion, 
in which the operators are put behind the object and read from left to right: A (rs) for 
(Ar)s, or first r operating on A, then s on the result. 


F2 A.B.M, 


166 GROUPS AND FIELDS [6 


consisting of six transformations (operators 1, w, w®, p, 4, r) applied 
to the triangle A as in example (ii) of 6.3. The set is found to be a 
non-commutative group. The result applies equally to the set of six 
permutations of a collection of three things, as given in example (iii) 
of 6.3; this set of permutations is the same non-commutative 
group. ᾿ | 

Products of transformations (operators in succession) are defined 
according to the table of 6.3. So: pw(A)=q(A) and wp(A)=r(A), 
illustrating that products are not commutative. They are closed and 
associative; for example: 


w? (pw) (A)=w?q(4)=r(4) 
(w*p)w(4)=qw(A)=r(A). 


There is an identity, the operator 1 leaving 4 unchanged. Finally, 
each transformation has its inverse, given by the operator which 
undoes what the original one does. Whenever an entry 1 appears in 
the table, the two operators concerned are inverse. Since each row 
has just one entry 1, there are unique inverses: ὦ and w? are inverse 
to each other; p, g and r are each its own inverse. Hence the set is a 
non-commutative group. There is also a subgroup of three trans- 
formations (operators 1, w, w?), as is seen at once from the table. 
The product of any two of these three is also one of the three. The 
inverse of any one of the three is also one of the three. These are the 
conditions for a subgroup by the theorem of 6.2. 

The subgroup (1, w, w*) corresponds to rotations of A through 0°, 
120°, 240° respectively. Unlike the complete group of six movements, 
the subgroup is commutative and cyclic. Take powers of the rotation 
w (through 120°): 


ww(A)=w?(A); waw(A)=ww?(A)=1 


i.e. the subgroup consists of w, its square w? and its cube w*=1. 
These are rotations through 120°, 240° and 360°, the last being the 
same as no change. The three operators can be associated with the 
cube roots of unity (1, w, w?), where w= 3(-—1+7/3), complex 
numbers representing the vertices of an equilateral triangle on an 
Argand Diagram (as shown in 3.8). Multiplication by the complex 
number w moves one vertex of the triangle in the diagram to the next 
(anti-clockwise), and this is the same as rotating the triangle through 


4, δ] ο GROUPS AND FIELDS 157 


120° (operator w). Everything links together neatly. The cyclic 
group, w, w*, w3=1, can be used either for the complex cube roots of 
unity, or for rotations of an equilateral triangle. 


6.5. Fields. We are primarily interested in sets when (and because) 
they have the structure of a group in respect of some binary operation. 
Some sets have only one such operation to consider, e.g. sets of 
transformations under multiplication. Other sets have two such 
operations, usually sums and products, e.g. sets of numbers or 
polynomials. These are systems of double composition: a set {a, b,c, ...} 
in which a sum (a+) and a product αὖ is defined for each pair of 
elements. The questions which arise are: is the set a group from the 
point of view of each of the operations by itself, and how are the 
operations related? 

Here we consider the most obedient of all systems of double com- 
position, that described as a field. As shown in 15.3, the concept of a 
field can be derived from the combination of two groups (one under 
+ and the other under x) with a distributive property linking + 
and x. The development proceeds by first defining the more general 
(less specialised) concept of a ‘ring’ and then by adding extra 
properties until the less general (more specialised) field is obtained. 
Alternatively, an independent definition can be written for a field, 
specifying the properties of sums and products (closure, associative, 
commutative), the distributive rule connecting them, and the re- 
quirement that equations of the simple kind: x+a=b and αἱ τεῦ 
should have solutions. The group properties of a field are then 
deduced. The end result of the two procedures is the same (see 15.3). 
Here, having already developed the group concept, we now write 
quite simply: 

DEFINITION: In a set F ={a, ὃ, c, ...} of elements of any kind, two 
binary operations give sums (a+b) and products ab. F is a field if 

(1) the elements form a commutative group (F +) under addition with 

identity zero (0); 

(2) the elements other than zero form a commutative group (F x) 

under multiplication with identity unity (1); 

(3) sums and products are distributive: a(b +c) =ab +ac. 


A field is a set which unites in itself two groups, the additive and the 


168 GROUPS AND FIELDS [6 


multiplicative, and in which the distributive rule links the two 
operations. 

Several features are to be noticed. Firstly, there are two particular 
elements of Κ΄, zero (0) and unity (1). By use of 0, the additive group 
gives negatives and hence subtraction: a-b=a+(-—b). By use of 1, 
the multiplicative group gives reciprocals and hence division: 
a/b=ab-1, Any element can be multiplied by 0 (i.e. a x 0=0, see 6.9 
Ex. 20) but 0 must itself be excluded from the multiplicative group. 
The reciprocal 6-1! and quotient a/b can only be written if 60. 
Secondly, the existence of inverses (negatives and reciprocals) implies 
cancellation : 


Ifa+b=a+e,thenb=c; if ab=ac (aX40), then b=c. 


In particular, there are no zero divisors: if ab=0, then a=0 or b=0. 

Thirdly, only one form of the distribution rule is given. Another form 

follows since F' has the structure of a commutative group: 
(a+6)c=c(a+b)=ca+cb=ac + be. 

The dual distributive property, as valid for Boolean algebra, does 

not hold for a field, 1.6. a+bc4(a+b)(a+c). 

As a result, a field F' satisfies all the operational rules of 2.2 with 
no exceptions whatever. Conversely, any set which satisfies all the 
rules is a field. If the set falls short of this, then some operational rule 
or other goes by the board. 

Just as groups have subgroups, so a field has all kinds of subsets of 
which some can be fields in themselves (i.e. subfields). Necessary and 
sufficient conditions for a subset K of a field F to be a subfield stem 
from the conditions for a subgroup (6.2). A form of the conditions 
which is easy to use in practice is: (a) if a, ὃ ε Καὶ, thensodoa+6 and 
ab, and (Ὁ) ifae K,so do —a and α-: (a0). A neater form is given in 
6.9. Ex. 21. In particular, a subfield must contain both 0 and 1, the 
identities of /’. One field can be extended into and contained within 
a larger field; and, in its turn, the larger field can be extended again 
into a still larger one. The process of adjunction of a new element 
as in 2.3, is useful in this connection. Conversely, a given field may 
contain a subfield, the subfield a further subfield, and so on. In such 
a set of ‘Chinese boxes’, we may look for the smallest subfield, the 
inner-most box. Such a subfield, called a prime field, has no proper 
subset which is itself a field. 


5] GROUPS AND FIELDS 159 


Though highly specialised, a field is a set which quite commonly 
occurs in mathematics and it is the most practical and obedient kind 
of set we have: 

(i) The set R of rationals. All the operational rules are valid, for 
ordinary sums and products, and fF is a field. No subfield can be 
found, i.e. & is a prime field. In particular, the set J of integers is 
not a field, since it lacks reciprocals. J is the kind of ‘ring’ (an integral 
domain) which is a very near approach to a field, without quite 
making it. The development of 2.6 shows how the integral domain 
J is turned into the field R by defining fractions and making good the 
lack of reciprocals. A term used in the past for a field is ‘domain of 
rationality’, expressing this idea. 

(ii) The set R(,/2)={a+6,/2 |a@ and ὁ rationals}. This is a field, 
obtained by adjunction of /2 to R (2.3). It has a subfield, 1.6. R. 
The reason why a+ 0,/2 provides a closed set, not only for sums and 
differences (which is obvious enough) but also for products and 
quotients, is because: 


(a +6,/2) (c+d./2) = (ac + 2bd) + (ad + be),/2 


ΣΕ - ΠΤ 
c+dJ/2 ΞΕ ἢ c? — 2d? 


as shown in Appendix A.6. So we always keep within the form 
a+b,/2. 

(iii) The sets of real and complex numbers are both fields, satisfying 
all operational rules. The Chinese boxes are building up from the 
prime field R: 


and 


RC R(J2)C R*CO. 


Instead of #(./2), we could substitute fields obtained by adjunction 
of other surds. C is obtained by adjunction of ὁ to R*: 


C= R* (ἡ Ξε ία -" ἰδ] a and ὁ real}. 


Again closure is obtained with the form a+b, for a similar reason to 
that just given for a+6,/2. Here: 


(a - δ) (ὁ + 1d) = (ac — bd) +1 (ad + bc) 
A+tb pane : (=) 


c+id \c?+d? 


and 
c? + d? 


160 GROUPS AND FIELDS [6 


(iv) The set F(x) of rational fractions 


= ne | f(x) and g(z) polynomials 


is a field, obtained from the integral domain 52] of polynomials 
over the field F in the same way that £ is obtained from J. 

(v) The set of integers (mod n). Examples (ii) and (vi) of 6.2 give us 
what we need, apart from checking that + and x are connected by 
the distributive rule. The check is simple, since integers are so 
connected and since taking remainders on division by n makes no 
difference here (6.9 Ex. 18). Hence, if 7 is prime, the integers (mod 7) 
are a field. If n is not prime, the set falls considerably short of being 
a field; reciprocals are lacking and there are zero divisors. For 
example, in {0, 1, 2, 3} (mod 4), 2 has no reciprocal and 2 x 2=0. 


+ Even Odd + 01 
Even Even Odd 0 0 1 
Odd Odd Even 1 10 

Χ Even Odd x 01 
Even Even Even 0 0 0 
Odd Even Odd 1 0 1 


(vi) The set {Hven, Odd}. The operations of + and x are defined 
by the tables shown in comparison with the tables for the set {0, 1} 
(mod 2). The latter set is a field since 2 is prime. The set {Even, Odd} 
is an exact parallel, and also a field. We have the simplest kind of 
field, of two elements, one being zero (even) and the other unity 
(odd). Negatives come from 1 +1=0, i.e. 1 (odd) is its own negative. 
Reciprocals come from 1 x 1 =1, i.e. 1 (odd) is its own reciprocal. It 
may be noticed, in passing, that the set {1, —1} under x can be 
associated with {0, 1} (mod 2) or with {Even. Odd} under +. The 
association does not work for the other operation. {1, — 1} is not a 
field; it is not closed and has no zero under addition. 


6.6. Algebraic numbers. An interesting, if rather academic, question 
is the following. All zeros of all polynomials with rational coefficients 
are complex numbers; how many different ones are there? First, 
since every rational number « is the zero of some polynomial (e.g. the 


6] GROUPS AND FIELDS 161 


linear polynomial x —«), the set of all zeros includes the countably 
infinite set R of rationals. On the other hand, since every zero is a 
complex number, the set of all zeros is included within the non- 
countably infinite set ΟἹ. It is easily seen that the set of all zeros is a 
field, closed for sums and products, and that it is countably infinite 
like the rationals (see 4.9 Ex. 22). This is a quite remarkable result. 
Numbers which are roots of polynomial equations are called algebraic 
numbers ; they form a countably infinite field A. 

The field A fits in between the prime field καὶ and the larger field C; 
A is a subfield of C and καὶ a subfield of A: RC ACC. We know already 
that the field #* of real numbers also fits in a similar way: RC R*CC. 
The question is: how are A and R* related? They are both subfields 
of C; both of them have a prime subfield 2. First, we know that the 
algebraic numbers of A, as roots of polynomial equations, include 
all rationals, at least some irrationals (like /2) and at least some 
non-real numbers (like ἡ). Hence the intersection Aq R* is not empty 
and it is a proper subset of A: (An .R*)CA. An R* comprises the 
real algebraic numbers, the real roots of polynomial equations. The 
rest of A is made up of the non-real (complex) roots of polynomials. 
On the other hand, we do not yet know whether (An R*)=R* or 
whether (An.R*)C R*, i.e. whether real algebraic numbers comprise 
all the real numbers there are, or only a (proper) part of them. 

The answer is easy: since A is countably infinite, so is AN R* (real 
algebraic numbers), whereas R* (all real numbers) is non-countably 
infinite. Hence, (An R*)C R*, i.e. real algebraic numbers are no 
more than a proper subset of all real numbers. The situation is 
indicated in the Venn Diagram of Fig. 6.6. 

An immediate consequence is that there 
are real numbers which are not roots of poly- 
nomial equations, just as there are roots of 
polynomial equations which are not real. 
Such real numbers are called transcendentals. 
We have shown that they exist. Examples 
are real numbers like 7 and e of great im- Fic. 6.6 
portance in mathematics. Moreover, they are 
not the exception, but very much the rule, among real numbers. Of 
the non-countably infinite set R* of all real numbers, the real 
algebraic numbers are no more than a countably infinite subset. 


162 GROUPS AND FIELDS [6 


What is left is the subset of transcendentals, and these are a non- 
countably infinite set in themselves. Transcendentals are very thick 
on the ground. 

6.7. Ordered fields. The development of real numbers (2.4) depends 
on the fact that the rationals form, not only a field, but also an 
ordered field. It is time to put the concept of order on a firmer basis. 
In a set {a, ὃ, c, ...} the idea of ‘succeeds’ or of ‘successor’ may be 
taken as undefined. Nothing is added by using the alternative term 
‘precedes’ or ‘predecessor’ : if ὃ succeeds a, then a precedes ὃ. On the 
other hand, the concept of ‘immediate successor’, if it is relevant, 
needs definition. Suppose that (1) 6 succeeds a, and (2) any other 
successor c of a is such that c succeeds b. Then ὃ is defined as the 
immediate successor of a. To visualise, imagine the elements of the 
set as strung out from left to right, according to their succession, 
along a line A as in the diagram. 


A set {a, b, c, ...} is ordered in this primitive sense if each element 
(with the possible exception of one, called the last) has an immediate 
successor. Equally, each element (with the possible exception of one, 
called the first) has an immediate predecessor. The symbol < is used 
to indicate order: a<b denotes ὦ is succeeded by ὦ, or a precedes b. 
The corresponding symbol > is: ifa<b, then b>a. Equality, denoted 
by =, can then be introduced to complete the symbols. 

Two illustrations follow: 

(i) The set of integers, or any subset, is the case on which this 
primitive ordering depends. J={... —2, —1, 0,1, 2, ...} is ordered 
as shown according to <, with no first or last term. J+={1, 2, 3, ...} 
is similarly ordered but with a first element. In the present primitive 
sense, the subset {0, 1,°2, ... π -- 1} can be taken as ordered, with a 
first and a last element. 

(ii) As something different, consider the order of six houses, all on 
one side of a road. They can be ordered and numbered from 1 to 6; 
but there is more than one way of achieving this. The order can be 
in the spatial sense, houses taken in sequence along the road from 
one end. But it could be (as in Japan) in a historical or chronological 
sense, numbering from 1 to 6 being according to the date of comple- 


7] GROUPS AND FIELDS 163 


tion of building. (The complication of two houses completed at the 
same date involves the use of < and = in the order.) The second 
method is not so practical since we move in space and not in time. 
There are further orderings possible if the six houses are on both 
sides of the road, e.g. numbering from one end with odd numbers to 
the left and even to the right, or down one side and back the other. 
The definition of an ordered field* is based on this primitive idea of 
order, but considerably refined and without reference to immediate 
successors. In any field, there are sums, negatives and hence dif- 
ferences. The suggestion is that the differences (6 — a) of two distinct 
elements can be positive or negative, which suffices to indicate 
a<b and a>b respectively. The concept of order in a field depends 
on the (undefined) concept of positiveness. This can be formalised: 


DeEFIniTIon: A field F = {a, b,c, ...} is ordered 1} tt contains positive 
elements such that: 

(1) one and only one of the following holds for every element a: 

a positive, azero, —a positive 

(2) if a and ὃ are both positive, then a+b and ab are both positive. 
This covers the case of a positive element a. By convention, however, 
where ~a is positive, a is said to be a negative element of Δ΄. The 
condition (1) is then: ὦ is positive, or zero, or negative. 

Let a and b be any two elements of Κ΄. Since ἢ is a field, the dif- 
ference ὦ —a is defined as an element of F and condition (1) shows 
that it is positive, or zero, or negative. Suppose b — a is positive, then 
we can write ας ὃ and b>a. These are alternative and equivalent 
notations for the same thing: ὃ —a positive. They specify what the 
symbols ‘<’ and ‘>’ mean in an ordered field: 


Notation: If 6 ~a is positive in an ordered field, write a<b (read: 
a less than 6) and write b>a (read: ὃ greater than a). The notations 
a<b and b>a are equivalent. 
Suppose b—a is negative. Then -- (ὦ -- α) Ξεα -- is positive and the 
new notation gives alternatively: b<a and a>b. Finally, suppose 
ὃ —a is zero. Then b+(-—a)=0 or ὁ is the negative of ( -- α). Since a 
is known to be the negative of ( -- α), ὃ -- α zero implies a=b. Con- 
dition (1) gives: ὃ -- σα is positive, or zero, or negative. This can now 


* It can be applied to an integral domain as well as to a field, e.g. J is an ordered 
domain, but 775] is not. 


164 GROUPS AND FIELDS [6 


be re-written: a<b, or a=b, or a>b. The condition (2) can be re- 
expressed in terms of < and > in much the same way, as in 6.9 
Ex. 26. So: 


THrorem: If F ={a, ὃ, c,. Ὑ 8 an ordered field, then 

(i) Trichotomy: a<6b, or a=.6, ora>b 

(ii) Consistency: if a<b, then a+c<b+c (any c) and ac<be (any 

positive c). 
Hence, in an ordered field, we can write <, = or > between any 
two elements. There is no difficulty in using < for ‘< or =’, and > 
for ‘> or =’. In particular, since 0 is an element of the field, a>0 
can be written for a—0 positive, i.e. a positive; and a<0 can be 
written for 0 -- α positive, 1.6. —a positive, a negative. 

The field καὶ of rationals is ordered since positiveness is ensured by 
the convention that the rational p/q>0 if the product of integers 
pq>0. The trichotomy and consistency properties of the theorem 
above are among the order properties of rationals as set out in 2.2. 

On the other hand, examples of non-ordered fields are easily got. 
Neither the field C of complex numbers nor the field F' (x) of rational 
fractions is ordered, since the elements do not have the trichotomy 
of positive/zero/negative. It is not so obvious that the field of 
integers (mod prime ) is not ordered in the present precise sense. 
It is condition (2) which fails: if a and ὃ are positive, then a+b is 
positive. Here, 1 and 2 are positive, but 1+2=0 is not, in the field 
of integers (mod 3). 

The concept of an ordered field can be developed as in 2.4: 


Derinition: A field F is completely ordered if it is ordered and 
such that: any subset of F which has a lower bound has a GLB; any 
subset of F which has an upper bound has a LUB. 


In the number system only the field R* of real numbers is completely 
ordered. 


6.8. Inequalities. The properties of an ordered field provide all the 
material needed for handling inequalities. There is a relation of 
equality in any field. For example, in the non-ordered field of com- 
plex numbers, one number can be equated to another, involving the 
process of “equating real and imaginary parts’. But the relation of 
inequality arises when we have an ordered field. Inequalities are 


8] GROUPS AND FIELDS 165 


subject to operational rules, derived from the properties of ordered 
fields; since they are somewhat tricky, we must lay them out carefully. 

In the following properties, a<b (or b>a) means ὃ -- positive in 
an ordered field F' = {a, ὃ, 6, ...}: 


(i) If a<b, then a+c<b+c (any c) and ac<be (c>0) 
7 (Consistency). 
(ii) If ab and b<c, then a<cc (Transitivity). 
(iii) If ab and c<d, then a+c<b+d. 
(iv) Ifa<b, then -a> --ὖ. 
(v) If a<b and ab>0, then 1/a>1/0. 


The further properties below relate to the particular case of positive 
and negative elements, i.e. a>0 means a positive, a<0 means a 
negative: 


(vi) If ab>0, then either a>0, b>0 or a<0, b< 0. 
If ab< 0, then either a>0, b<0 or a<0, b>0. 
(vii) If a>0, then 1/a>0. If a<0, then 1/a<0. 
(viii) If a>b>0, then 1/b>1/a>0. If acb<0, then 1/b<1/a<0. 


Many of the proofs are straightforward applications of the definition 
of ordered fields in 6.7. One or two are, however, a little awkward. 
Notice, for example, property (v). Here, a<6 does not necessarily 
imply 1/a>1/b. This implication is valid if and only if a and ὃ have 
the same ‘sign’, meaning ab>0 as indicated in property (vi). For 
example, 2<3 does imply 1/2>1/3 and -3< -- 2 does imply — 1/3> 
—1/2; but, though — 2<3, it is not true that — 1/2>1/8. 

A characteristic feature of an ordered field F is that the square of 
any non-zero element is necessarily positive: 


a®=0 ifa=—0; a?>0 if a0. 
This is not necessarily so in a non-ordered field. For example, in the 
field of complex numbers, the element ὁ has a real square 2? but it is 


negative: 72= —1. The result extends: 
If αἱ, a,, ... a, are elements of an ordered field, then 


n 
> a,2>0 unless a, =a,=...=a,=0. 
r=1 


To illustrate the handling of equalities and inequalities, consider 
cases of polynomials or rational fractions where z is taken as a real 


166 GROUPS AND FIELDS [6 


number. Pursuing the distinctions of 1.4 and 1.5 above, we say that 
an equation holds for certain « which compose its solution set, as in 
the simple examples: 


Equation Solution set 
(i) 5-27=0 x =; 
(1) 2?~37r+2=0 “x=I1,.2 
(iii) 82+ 2—0 r=3 


Graphically, the solution set corresponds to the intersection of two 
sets, each consisting of pairs of real numbers (x, y) shown as points 
in the plane Oxy. For example, in (ii), y=? — 3x +2 gives the set of 
points (x, y) of the curve shown in Fig. 6.8. If the second set is y =—0, 
1.6. the line Ox, then the solution set for (ii) is the intersection of the 
two sets. This is the pair of points A(1, 0) and B(2, 0). So: e=1, 2. 
In (iii), with the same first set, then y = 6/x gives another set of points 
(x, y) on the second curve of Fig. 6.8. The intersection set is now the 
single point C'(3, 2). So: x=3. 


8, 9] GROUPS AND FIELDS 167 


Now consider inequalities in the same way: 


Inequality Solution set 
: 5 
(i) 5-2x¢>0 a<5 
(ii) a? -32+2<0 l<r<2 
(iii) 2-32 ἐπε: 0<2<3 


To spell these out carefully, we proceed as follows. For the first two 
inequalities : 
(i)5-22>0 gives -—2x>-5 by property (i) with c= —5 
gives 24< 5 by property (iv) 
gives x<5/2 by property (i) with c=1/2>0 
(ii) 2? — 35 + 2 =(a — 1) (a —2)<0 gives by property (vi): 
either x-1>0 and x-2<0 
1.e. x>1 and 2x<2_ by property (i) 
or xz-1<0 and x-2>0 
1.6. <i and x>2 by property (i). 


The first is consistent, the second not. Hence 1<x<2. 

Graphical methods again give the solution set as the intersection 
of two sets. In (ii), the set of points on the curve y=2? -- 85 - 2 and 
the set of points (y<0) below the axis Ox intersect in the set of 
points on the curve from A to B. For these points: 1<%<2. In (iii), 
the same curve is related to the set of points (y<6/z) below the 
second curve y=6/xz. The intersection set consists of points on the 
first curve between D and C. For these points: 0<%<3. 


6.9. Exercises 


1. A group depends on a binary operation +, usually + or x ; explain why 
neither subtraction ( -- ) nor division (+) is 8 suitable operation *. See 2.9 
Ex. 3. 

2. Show that S={... - 4, —2, 0, 2, 4,...} is a group under + and hence a 
subgroup of the group J of all integers. Check by the conditions (6.2) for a 
subgroup. Extend to the set of multiples of any integer. 

3. Show that the set {... }, 4, 1, 2, 4, ...} of integral powers of 2 is a group 
under x, a subgroup of the group καὶ οὗ all rationals. Extend as in Ex. 2. 

4. Show that the set R* of real numbers is a group under + and also (when 
0 is excluded) under x ; establish the same result for the set C of complex 
numbers. | 


168 | GROUPS AND FIELDS [6 


5. Solution of linear equations in a group. G = {a, ὃ, 6, ...} is a group under «, 
so that  =b-1 * a is an element of G. Show that z satisfies ὃ κ «=a. Further, 
if x, and a, both satisfy ὃ * x =a, show that x,=2,. Deduce that the linear 
equation ὃ * x=a has a unique solution x =b— * a in a group. 

6. As a case of Ex. 5, show that an additive group (commutative) is charac- 
terised by the fact that 6+2 =a and x +b=a alike have the unique solution 
x=a-b. 

7. Show that, in a group under x, the linear equation bz =a has the unique 
solution «=6b-'a, and xb=a the unique solution z=ab-!. If the group is 


commutative, bz =a and xb =a equally have the unique solution x = ‘ 


8. Cyclic groups under addition. If na =0, show that {a, 2a, 3a, ... na} is a 
finite cyclic group under + with group identity 0. Illustrate by writing the 
integers (mod 7) as {1, 2, 3, ... n} where n =0. 

9. Show that the x table for integers (mod 7) has 0 in the pth row and qth 
column if and only if n=pgq/r for some positive integer r. Indicate how this 
supports the proposition that the integers (mod 7), excluding 0, form a group 
under x if and only if n is prime. 

10. In ἃ group under *, show that the inverse of a « ὃ is b-} * a-! (and not 
α΄ * 6-' unless the group is commutative). Start from b-! * a-!*a+*b and 
reduce by using a~! * a=b-! * b=e. 

11. In 8 group with an identity e such that a *e=e*a=a (and with 
cancellation valid), deduce that e is unique. (Show that, if there are two e, and 
€,, 80 that a * 6, =a * e, =a, then e, =e,.) Similarly, show that a-! such that 
a*a-1=a-!* a=e is unique. 

*12. Cosets. Partition the set J of all integers into the set J, of even integers 
and the set J, of odd integers. Under +, the group J has subgroup J, (see 
_ Ex. 2). Check that J,, which is not a group, is obtained from J, by adding 
1 to each element. Sets J) and J, are called cosets. Extend the result, e.g. to the 
case of five cosets of J: all multiples of 5 making up the subgroup J, and J, 
(r=1, 2, 3, 4) being obtained by adding r to each element of J). Compare 
3.9 Ex. 18 on residue classes. (The sets J, are both residue classes — when de- 
rived from remainders on division by a given integer — and also cosets — when 
obtained by adding the same integer to the elements of some subgroup J,.) 

*13. Factor group. Consider the set {J,, J,} of two elements, the sets defined 
in Ex. 12. Define J, + J, =set of elements obtained by adding an element of J, 
to an element of J, (r=0, 1; 8 =0, 1). Show that J, +J,=J,+J,=d, and that 
Jy +d,=J,+J_)=J,. Hence show that {J,, J,} is a group under +. Such a 
group is called a factor group. Extend as in Ex. 12. 

*14. Partition S={i, —1, --ἶ, 1}, a cyclic group under x, into S,={l, -1} 
and S, = {t, -- ὁ). Show that S, is a group but not S,. Each element of S, is a 
multiple ὁ of an element of S,, another example of cosets. Show that the set 
8.» S,} of two elements is a group under x, with the definition that 
S, xS,=set of elements obtained by forming prodasts of an element of S, 
and an element of S, (r =0, 1; s=0, 1). This is another factor group. 


9] GROUPS AND FIELDS 169 


15. Rotate axes Oxy through 6° (clockwise) to give new axes Oz’y’ as in 
Fig. 6.9. The fixed point P has co-ordinates (x, y) changed to (x’, y’). By show- 
ing that OM’ =OM cos 6 - MP sin @ in the figure, establish that 

x’ =x cos 8 —y sin 6 
and y’ =xsin 6+y cos @ 
similarly. Show that this is also the trans- 
formation for the rotation of a figure through 6° 
(anti-clockwise), axes being fixed. 

16. Movements of a square. First rotate a square 
through 0°, 90°, 180° and 270° and show that the 
transformations can be denoted by 1,7, —1, --ὧ, 
Then turn the square over (i) horizontally, (ii) 
vertically, (111) about the diagonal sloping up- Fie. 6.9 
wards from left to right, (iv) about the diagonal 
sloping downwards from left to right. Write these transformations p, 4, r and 
s. Completing the table of pairs of transformations as shown, deduce that the 
transformations form a non-commutative group. 


First transformation 


᾿ 1. ὲὶ - -% Pp qr 8 
Second: 1 } ὁ -1 -ὸὋ » qr 8 
a 4-1 -« 1 r 8&8 q p 

-—1 -l -+ 1 ὁ q p 8 fr 

—4 -4 1 ὁ -ἰ 8S Tr p @q 

ye p gs gq Ff 1 -l - ὦ 

q q r p 8 - Ι ἡ -ὸ 

r r p s&s g δ --ὖ 1 -} 

8 8S q@ r p -t 4 -Ἰ 1 


17. In the group of Ex. 16, show that the subset of transformations 
{1,, - 1, —7} form a cyclic subgroup which is commutative. 

18. From the + and x tables for {0, 1, 2, 3, 4} (mod δ), check that the 
distributive rule a(b +c) =ab +ac holds, by trying out various a, ὃ and c. 

19. Show that {-- 1,0, 1} is a field, 1.6. a commutative group under + and 
(excluding 0) a commutative group under x, if 1+1=0 is imposed. 
Check that the distributive rule holds. | 

20. The additive identity (zero, 0) of any field F is excluded from the 
multiplicative group. But 0 can be used in multiplication: a x 0 =0 for any a. 
Consider the question whether a x 0 =0 should be specified as a requirement 
for a field, in linking + and x. Show that the answer is no, since a x0=0 
follows from the distributive rule. (Start with a(1+0)=ax1l+ax0, write 
1+0=1 and use the additive cancellation rule.) 

_ 21. Subfields. Show that necessary and sufficient conditions for a subset Καὶ 
to be a subfield of a field F are that K contains at least one non-zero element 
and that, ἔα Ὲ Καὶ, bE Καὶ, thena-bEK and ab-1€ Καὶ where b+ 0. 


170 GROUPS AND FIELDS [6 


Ἐ22. Quaternions. Extend the concept of a complex number to the quater- 
nion, defined 2 =a - ἰδ +jc +kd where a, ὃ, c and d are real and where products 
are subject to the conventions shown on products of i, 7 and k. Show that the 
set of 2’s (all real a, ὃ, c and d) satisfies the rules for a field, except that products 
are not commutative. Such a set is called a skew field. 


Furst element: 
4 } k 


28. Field extensions. Pull together various ways of extending a given field 
fF into a larger field. First, from F is obtained the integral domain 12] of 
polynomials over F and the quotient field formed to turn 775] into the field 
F(x). This is the field of rational fractions, ratios of polynomials over F (3.4). 
Second, the adjunction of an outside element x to F gives a field F(x), which 
is also to be identified with the set of ratios of polynomials over 1" (3.4). The 
field of complex numbers is obtained from the field of real numbers by ad- 
junction of ὁ in this way. Third, select from the integral domain F[z] a poly- 
nomial g(x) irreducible in F and write the field of polynomials mod g(x) (see 
3.9 Ex. 20). The field of complex numbers arises from the field of real numbers 
by taking g(x) =x? +1 in this way (and see 7.9 Ex. 20). 

*24. The field F ={0, 1} (mod 2) contains two elements only. The set of 
polynomials over F' can be enumerated (see 3.9 Ex. 11): 

0, 1,27,  -Ἐ1, 2, 227+1, αὐ τα, αξ- Ἐπ -ἹἸ, ... 
The polynomial z?+2+1 is irreducible and the set of these polynomials, 
mod x? Ἐπ +1, is a field. Show that it consists of four elements: {0, 1, x, x +1}. 
Check that, in this field, x and x+1 add and multiply alike to 1. [Note: 
£+(%+1)=2%+1=1 (mod 2); x(a +1) =a? Ἐπ- -1 (mod x? +2”+1)=1 (mod 
2).] 

25. Illustrate that a set which is not an ordered field can have a subset 
which is an ordered field by reference to the set C of all complex numbers. 

26. If a<b, show that a +c<b +c (any 6) and ac<be (any c>0), by use only 
of the criterion that a<b if ὃ —a is positive. 

27. Define an ordered integral domain (lacking only reciprocals) as for an 
ordered field and illustrate with the set J of integers. 

28. Show that, if a<b and c<d, then a+c<b +d, but it is not necessarily 
the case that ac< bd. Illustrate by examples in which a and ὁ are both negative 
while ὃ and d are of opposite sign. 

1 » = >0 ifa>b>0 and + < = <O0ifa<b<0. 
b° a b a 

30. Show that the inequalities z? +z +1<1/(1 +2?) and x>0 have a solution 

set which is empty. Deduce that “5 +2 +1>1/(1 +2?) for all 2>0. 


29. Prove and illustrate that 


CHAPTER 7 
RELATIONS AND FUNCTIONS 


7.1. Relations. Consider sets of elements in the general sense of 
Chapter 4; they may be groups or fields, but they may equally have 
some quite different structure. Let X and Y be two sets and write 
xe X and ye€Y as representative elements. Subscripts can then be used 
to indicate various elements: x, 2:5. 24» ... a8 elements of X; yj, Yo, 
Y3, ... a8 elements of Y. 

Form the set of ordered pairs (x, y), ordered in the sense of x (from 
X) first and y (from Y) second and not the other way round.* To 
illustrate in graphical terms, suppose that X and Y are sets of 
integers, rationals or real numbers. Then the elements of X can 
be marked off on one axis Ox and the elements of Y on another axis 
Oy, the ordered pair (x, y) being shown by a point in the plane Oxy. 
If it happens that Y contains zero (0), then particular points (x7, 0) | 
on Ox are in the set of (x, y) and correspond to the set X as well. 
Similarly for the set Y on Oy. As a notation, the set of (zx, y) is called 
the Cartesian product and written X.Y: 


Notation: The Cartesian product of two sets X and Y is the set of 
ordered pairs 
X.Y={(x, y) | veX, ye Y}. 


Since the order of writing x and y is essential, the Cartesian products 
X.Yand/Y.X are different. As a particular case, for the single set 
S ={x,y,2,...}, there isa Cartesian product: S .S = {(x, y) |x eS, y eS}. 
This is not the same as the smaller set {(x, x) | ze}. As simple 
examples, take X ={1, 2} and Y ={1, 2, 3, 4}, so that X . Y is the 
set of 8 elements illustrated by 8 points in Fig. 7.1a. Similarly Y .X 
is a different set of 8 points; X . X is a set of 4 points and Y . Y one 
of 16 points. 


* Strictly, an ordered pair is a primitive (undefined) concept subject to an axiom: 
the property that (2, y) = (u,v) implies x=u and y=. 


172 RELATIONS AND FUNCTIONS [7 


x Y-X The concept of a relation is a 
broad and sweeping one. Any 
proper subset of Α΄. Yisarelation 
FR from the set X to the set Y. 
Hence, αὶ is a set: the set con- 
sisting of some but not all of the 
ordered pairs (x, y) of X.Y. Ris 
to be distinguished from its speci- 
fication, which may be by listing 
or by a general description of the 
pairs (x, y)in R. The specification 
is written yRz, to be read ‘y is 
related by καὶ to x’, giving the rule 
for connecting x in X and yin Y. Risa set and the specification y Rx 
is a statement or a rule. The concepts of Chapter 5 are relevant since 
we have both a statement yRzx and a corresponding set R. In the 
following, implication (p implies qg, or if p then 4) and equivalence 
(p and q equivalent, or if p then g and if q then p) are freely used 
as statements true in all logical possibilities. 


oOo = ν ὦ «ἃ μέ 


ΕἸα. 7.la 


DEFINITION: A relation R from the set X to the set Y 1s any proper 
subset of X . Y. The specification yRx implies (x, y) ε R so that: 


R={(x, y)|weX, ye Y, γα). 
The statement yRzx is to be read ‘y is related by R to x’. 


Kach x Ε X may or may not have a ye Y to correspond in y Rx. Those 
x for which there are y’s make up a subset of X called the domain of 
R. Similarly, each ye Y may or may not correspond to an xe X in 
yRx. Those y for which there are x’s make up a subset of Y called the 
range of R. 

In speaking of a relation, we can use the set R and the statement 
yRx interchangeably. Some examples illustrate: 


(i) X ={1, 2}, Y={1, 2, 3, 4}. Write: 
R,={(z, y) ες, yeY, y=2x} and R,={(x,y) |weX, yeY, y<2z}. 


The set of 8 elements X . Y has a subset of 2 elements, (1, 2) and 
(2, 4), for R,, i.e. the points A and B in Fig. 7.1a. It has a subset of 
4 elements, (1, 1), (2, 1), (2, 2) and (2, 3), for R,, i.e. the points below 


1, 2] RELATIONS AND FUNCTIONS 173 


A and B in Fig. 7.1a. The statements are: yR,x for ‘y=2z2’ and 
yhx for ‘y< 22’. 

(ii) X and Y both comprise all real numbers. Specify Rk, and R, 
as in (1). Then R, given by the statement ‘y= 2z’ is the set of points 
on the line y=2z in the plane Oxy, and R, for ‘y<2z2’ is the set of 
points under the line y= 22. They are both subsets of X . Y, the set 
of all points in the plane. 

(iii) X and Y both comprise all positive real numbers (x>0, y>0); 
X . Y is the set of points in the positive quadrant of the plane Ozy. 
Consider the relations: 

Ry Ξί(, y) | ve X, yeY, e+ y=]}; 

R,={(«, y) |reX, yeY, x+y<]}; 

R,={(z, y) | weX, yeY, 2? +y2=]}; 

R,={(a, y) | χει, yeY, 2+y*?<l1}. 
The sets R, and R, are shown by the points on the segment AB of the 
line « +y=1, and on the are AB of the circle x? + y*=1, respectively 
in Fig. 7.16. The set R, is shown by the points within the triangle 
OAB, as shaded; the set R, corresponds to 
points within the quarter-circle OA B. 

(iv) Consider the 16 members of the tribe 
of 4.1, example (iii), and write X for the set 
of 7 males and Y for the set of 9 females. 
Then X . Y is the set of 63 different pairings 
of male/female. Define the relation: 

R={(z,y)|xeX, ye Y, y is the wife of 7}. 
Here yRz is the statement ‘y is the wife of 2’. 
The set R comprises 4 out of 63 pairs, two 
married couples in the first and two in the second generation. 

The variety of relations is very great: any limitation however slight 
on x and y gives a relation. The x’s and y’s may be numbers, discrete 
as in (i) or real variables as in (ii) and (111); the relations may involve 
an equation or equally well an inequality. But relations can be 
perfectly well specified between non-numerical x’s and y’s, as in (iv) 
where the elements are persons and the relation is the ordinary one 
of wife to husband. 


ΕἸα. 7.16 


7.2. Equivalence. Consider first an important, if very special, kind 
of relation, that of equivalence between elements of a given set 


174 RELATIONS AND FUNCTIONS [7 


S=({2, y,z,...}. In a set of numbers it means equality (=); in other 
sets, it has a wider connotation. For example, let S consist of all 
finite sets A, B, C,... asin 4.5. Then equivalent sets are those which 
have the same count of elements. Equivalence, in fact, is the concept 
of things being the ‘same’ or ‘alike’ in some way. 

DEFINITION: yRx 1s an equivalence relation in a set S if the 
properties: 

(1) Reflexive: «Rx logically true; 

(2) Symmetric: if yRx, then xRy; 

(3) Transitive: of zRy and yRx, then zRz; 

hold for any x, y and z in ΚΑ. 

The emphasis here is on the statement yx of equivalence, rather 
than on the set FR of pairs (x, y) of equivalent elements of S within 
the set S .S of ordered pairs. However, the set Rf is always to be 
remembered. A feature of equivalence is that R is symmetrical, i.e. 
the ordering in this special case does not matter. 


For example, let S be the set of positive rationals 7 (p and q 


positive integers). Then Ai is an equivalent relation given by ‘the 


integers ps and gr are equal’. It can be shown to satisfy all three 
properties. Having established the equivalence, we usually write 


Ras =, 1. far if ps =qr. Further, fixing one rational 7 we can find 


all the rationals , equivalent (equal) to it. If 4 is fixed, the equivalent 


rationals in S are 4, 2, 2,.... As a set, R can be written 
ay ΤΠ ὩΣ Ve 8 ΝΕ 
r= {{{,2 Ἐς 8,7 eS, ps =arh. 


The concept of equivalence is, however, far wider. It is one of the 
basic ideas in logic and mathematics. Consider the set S of all finite 
sets A, B, C,.... Take the relation R of ‘equi-numerous’ as defined 
in 4.5. Then the following all hold: ARA; if ARB, then BRA;if ARB 
and BRC, then ARC. Write them with RF replaced by ~ for ‘equi- 
numerous’: A~A; if d~B, then B~A; if A~B and B-~C, then 
A-~C. From the counting aspect, equi-numerous sets are alike; they 
are equivalent. 


2] RELATIONS AND FUNCTIONS 175 


One thought may now occur to us. We have used the term ‘equiva- 
lence’ for statements p and q in the sense: if p then q and if q then p. 
This is a relation R between p and gq. Is it an equivalent relation, so 
that our terminology is consistent? It is easily checked to be so, by 
taking the three properties in turn: 

Reflexive: pRp i.e. ‘if p then p and if p then p’, which is true. 

Symmetric: pRq and qRp are the same i.e. ‘if p then q and if q 

then p’ is the same as ‘if g then p and if p then q’, 
which is so. 

Transitive: from pRq and qRr we get pRr, i.e. ‘if p then g and if 

4 then p’ taken with ‘if ᾳ then r and if r then q’ gives 
‘if p then r and if r then p’, which is the case. 


Carrying out checking of this nature is not often necessary, since it is 
reasonably obvious in most cases of equivalence that the properties 
are valid. 

Equivalence has to do with the partitioning of a given set S. Of the 
various subsets of S which can be written, many are overlapping. 
Even if they are disjoint, they may or may not exhaust S between 
them. A partition of S is a set of subsets which are both disjoint and 
exhaustive of S. As an example, consider the set of 100 people 
classified according to their smoking habits, as given in 4.6. The 
seven subsets given in the original data overlap and do not exhaust 
the whole set of 100. The job undertaken in 4.6 is to get the numbers 
of elements common to the subsets and to derive the ‘remainder’ 
(the non-smokers). Partitions of the set S of 100 people can then be 
written in various ways. With the notation: A for cigarette smokers, 
B for cigar smokers, C for pipe smokers, one partition of S is: 


S=A+A’B+A'B'C+A'B'C’ (100=42+104+10 +38) 

i.e. S is all cigarette smokers + cigar smokers not smoking cigarettes + 
pipe smokers not smoking cigarettes or cigars+non-smokers. A 
similar partition 15: 

S=B+BC+ABC+A'BC (100=174+19+4 26438). 
Another partition is of a particular kind, a finer grouping of the first 
one: 

S=ABC+ABC'+ABC+ABC +A’ BC+ A'BC'+A'BCH+A'BC, 
Na nt ῤῸ ἘΠ... “--.----» 
Α A'B 


176 RELATIONS AND FUNCTIONS [7 


This finer grouping, a partitioning of S into 8 subsets, is the one dis- 
played in the Venn Diagram of 4.6. 

The connection between equivalence and partitioning follows from 
some simple propositions on an equivalence relation ψ in the (non-— 
empty) set S = {m, y, 2, ...}: 

(a) If S, is the set of y equivalent to a given x(yRx), then at least 
one (non-empty) S, exists as a subset of S. Since 8 is not empty, it 
contains an element x and x ε S,(xRz). 

(6) If γα, then S,=S, and conversely. The direct part is estab- 
lished : if yRx is given, then xRy also. Let z ε S,, so zRx which with 
«Ry gives zRy, 1.6. z ε S,. Hence 8, <S,. This same kind of argument 
gives S, <S,. So: 8,=S,. The converse: given S,=S,, it follows that 
y ES, (since yRy) and hence that y ¢ S,. Hence y Rx by the definition 
of 5... 

(c) If S,AS, exist, then they are disjoint. Suppose S, and S, have a 
common element z, so that zRx and zRy. But zRy gives yRz, which 
(with zRz) gives yRx. By (b), S,=S,, a contradiction. Hence S, and 
S, are disjoint. Q.E.D. 

Now consider all the sets like S,, which can be written and which are 
distinct. There is at least one by (a). If more than one, they are 
disjoint by (c). By (6), if x and y are equivalent (yRx) they go into 
the same S,; otherwise, into distinct S, and S,. Finally, all the sets 
like S, exhaust S since, if there were an element z left over, then S, 
could be formed with z in it (zRz) and added to the sets already 
written. Hence: 


THEOREM: An equivalence relation yRx in a set S determines a 
partition of S into disjoint and exhaustive subsets S,. Two elements x 
and y of S are in the same S, if and only if να. 


The subsets S, are called equivalence classes of S. All elements of S 
which are equivalent to a given element x (and equivalent to each 
other) go into one and the same S,. An element x of S, can be taken’ 
as representative of all, and called the canonical form of S,. All other 
elements of S, are equivalent to the canonical form x. Note that the 
partition may consist of only one equivalence class (if all elements of 
S are equivalent), and that it may include equivalence classes which 
have only one member. Some examples follow: 

(i) S={A, B,C,...} comprising finite sets. The equivalence 


2, 8] RELATIONS AND FUNCTIONS 177 


relation is 4 ~B, or ‘A has the same number of elements as B’. The 
partition of S is into equivalence classes S,, S,, S3,... where S, 
comprises all sets with r elements. 

(ii) S is the set of 100 people classified by smoking habits (4.6). 
The equivalence relation is ‘7 and y have the same smoking habits’, 
meaning that x and y are alike in smoking cigarettes (or not), cigars 
(or not), pipe (or not). The partition of S is into 8 subsets as shown 
above. The first subset ABC consists of all those equivalent people 
who smoke all three; and similarly for the others. 

(iii) S is the set of 16 tribal members of 4.1, example (iii). The 
equivalence relation is yRx for ‘x and y are of the same generation’. 
S is partitioned by generations into an equivalence class of 4 people 
(first generation), another of 8 people (second generation) and a 
third of 4 people (third generation). 

_ (iv) The group J of integers under addition. The equivalence 
relation is ‘x and y have the same remainder on division by 3’. The 
partition of J is: 


Subset Canonical form 


This is linked with the group of integers (mod 3) under addition. 
The set of equivalence classes is {J),J,,J,}; the set of canonical 
forms is {0, 1, 2} which is the group of integers (mod 3). 

(v) The cyclic group S = {2, 2, 13, 24) Ξε {ἡ, —1, —72, 1} under multi- 
plication. The absolute value of a complex number 

|a+%b |= /(a? + 6%), 

giving the equivalence relation: ‘x and y have the same absolute 
value’. Then S is itself one equivalence class. 


7.3. Functions and mappings. A relation y Rx from the set X to the set 
Y is such a general concept that it does not imply that, for each x in 
the domain of X, there is just one y in the range of Y ; there may well 
be several such y’s. For example, if x and y are real numbers, the 
relation «+y=1 does give one y for each x; but y?=2 gives no y 
when « is negative, one y (i.e. y=0) when z=0 and two y’s (y= + /2) 


178 RELATIONS AND FUNCTIONS [7 


when «x is positive; and x+y< is a relation in which (infinitely) 
many y’s correspond to each ~. 

A relation f such that just one y corresponds to each x in the 
domain is of the greatest importance; it is called a functional relation 
or more simply a function. The term ‘function’ can be used, when there 
is no ambiguity, to indicate both the set of the functional relation f 
and the statement yfx (see 9.1 below). So, yfx can be read ‘y is a 
function of χ᾽ and replaced by the more usual y=f(z). 


DEFINITION: A function f from the set X to the set Y is a relation 

specified by the statement yfax such that, if y ΕΥ̓͂ exists fora giwenxe X, 
then y is unique. The statement yfx can be written y =f (x), read ‘y 18 a 
function of x’. 
In full, in terms of sets, f is the set {(z, y)|xeX,yeY, y=f(x)}. 
As a further essential notation, the function f is defined on the 
domain {x | there exists y such that y=f(x)}, a subset of X; and the 
range of f is {y | there exists x such that y=f(x)}, a subset of Y. 

Usually, with little loss of generality, the set X can be limited to 
the domain of f. We say then: y =f (x) defined on X. The range is still 
a subset of Y consisting of y’s such that y=f(x) for some x in ἃ. 
There is a unique y in the range for each specified x and the function 
y=f (x) is often called single-valued.* The converse is not generally 
true; for each specified y in the range of y=f(x), there may well be 
more than one x to correspond. Some examples illustrate: 

(i) y=2x defined on the domain X of all positive integers. This 
gives one y for each x (as required) and the range is the set of even 
positive integers. It also gives one x (i.e. x = 4y) for each y in the range. 

(ii) y =x? defined on the domain ἃ of all real numbers. This gives 
one y for each x (as required) and the range is the set Y of non- 
negative real numbers. For each y+0 in the range, there are two x’s 
to correspond («= +./y). This function also becomes single-valued 
both ways if it is defined on the domain X of all non-negative real 
numbers, which is also the range (y=2z?, x= +./y). 

(iii) A function need not be algebraic or analytical. It can very 
well be non-analytical. So: y=f(x) where ‘y is the wife of 2’ is a 
perfectly respectable function under monogamy. This is so in the 
tribe of 7.1, example (iv). 


* The definition here does not cover what is called a multi-valued function 
(e.g. x? +y? =1, x a real number). These need to be split into single-valued branches. 


9] RELATIONS AND FUNCTIONS 179 


A function can be viewed from a rather different angle if X and Y 
comprise real numbers. A relation, as a subset of the Cartesian 
product X . Y, is shown as a subset of points in the plane Ozy. A 
function is represented by a particular kind of subset of points; all 
lines parallel to Oy (and corresponding to the domain of X) intersect 
the subset of points in just a single point. The representation of a 
function is a more or less recognisable ‘curve’ proceeding from left to 
right over the domain of X. This is the graphical aspect of a function. 

An equally important diagrammatic aspect is that of a function 
as a mapping. This merits a separate definition: 


Derinition: The function y =f (x), defined on X and with a subset of 
Y as range, gives a mapping of the set X into the set Y, denoted -Υ̓ 
7 


such that to each x ε X there is a unique image f(x) ε Y. 


It must be stressed that a mapping is the same thing as a function; 
nothing new is introduced. At the same time, a mapping is often a 
good way of looking at a function, a very convenient expression of 
the relation involved. Functions, though definable for sets of any 
kind, are so often used for numbers (numerical variables) that it is 
difficult to avoid thinking that the association is 
inevitable. There can be no such inhibition for 


mappings. A set X of any kind can be mapped 7] i 
into another set Y of the same or different kind. For | 8|————>['6 
example, the mapping y=2z of the set X, of 74————~T|['*t 
positive integers {1, 2, 3, ...}, into the set Y of = 64{———>}12 
positive integers, can be shown as in Fig. 7.3. 54————> [10 
Kach element of X has its unique image inthe set 4] ~W-_-+ | g 
Y. Exactly the same diagram might serve (for 5] ὁ“ οἷς 
example) for the mapping y=wife of x, where ἃ ‘ 
is the set of married men in a tribe and Y the set 

wo——_>)r2 


of women. Other kinds of diagrams can be drawn 
for mappings (e.g.) of a set of points in three 
dimensions into a set of points in two dimensions. 
In a mapping, all we require of f(x) is that it lays down a definite rule 
for getting from xz in X to its unique image y in Y. 

The following distinctions can usefully be made about a mapping 
f and about the corresponding function y =f (2): 

(a) The whole of the set X is here taken as the domain so that each 


G 


Fig. 7.3 


A.B.M, 


180 RELATIONS AND FUNCTIONS [7 


xe X has its unique image y e Y. The converse is not necessarily true ; 
two or more xz e X can correspond to a single specified image y ΕΥ̓. 
The mapping is generally many-one. It is a special case when the 
mapping is one-one. 

(b) The sets X and Y are generally different. It is a special case 
when X is mapped into itself. 

(c) The range of y=f(x) is a subset of the set Y; generally it will 
be a proper subset and the mapping is of X into Y. It is a special case 
when the range is the whole of Y and the mapping is of X onto Y. 

These distinctions can be illustrated: 

(iv) y= 2x is a mapping of X = {1, 2, 3, ...} into itself, as illustrated. 
The mapping is both one-one and into. The range of y is a proper 
subset of X. Again, y=2? is a mapping of the set X of all real num- 
bers info itself, or onto the set Y of non-negative real numbers. In 
each case, the mapping is two-one. 

(v) Consider the set X of three objects {A, B, C} and the permuta- 
tion: A to C, B to A, C to B. This is a one-one mapping f of X onto 
itself, the rule for getting images being: 

f(A)=C, f(B)=A4, f(C)=B. 
Any mapping of a finite set onto itself must be one-one (and so a 
permutation). ᾿ 

(vi) X is the set of (currently) married men, and Y the set of all 
women, in a tribe. Consider the relation ‘y is the wife of x’. In a 
monogamous tribe, this is a one-one mapping of X into Y. If the 
conjugal convention is polyandry, it is a many-one mapping of X 
into Y. Under polygamy, however, the relation is not a function 
and there is no mapping; a man’s image (wife) is then not 
unique. 

Consider the concept of one-one ἐδετοἐπ πῶς: defined in 4.5. 
In a general sense, a correspondence between two sets X and Y is 
many-many. It is only useful, however, if it is many-one, in which 
case it is a function or mapping. The particular case of a one-one 
correspondence is that of a one-one mapping. The notation for map- 


pings can usefully be extended: 
(a) If y=f(x) is a many-one mapping of X into Y, write X a 


and for particular images ¥, = f(%1), Y2=f (He), ... Write x, a, 


9, 4] RELATIONS AND FUNCTIONS 181 
(Ὁ) If y=f(x) is a one-one mapping of X into Y, write XY, 
f 
and for particular images y,=f(x,), Y2=f(%2),... write 244, 
Lo PYos ve 
As a final example of a many-one correspondence (mapping), 
consider the link between letters and digits on the dial of a London 
telephone instrument: 


ABC DEF GHI JKL MN PRS ΤΌΝ WXY 0Q 
1 2 3 4 5 6 7 8 9 0 


with Z not used. There is no correspondence between 26 letters 
and 10 digits. If Z is omitted, there is a correspondence. The set 
of 25 letters is mapped into the set {1, 2, 3, ..., 9, 0} of digits, and 
onto the set {2, 3, ... 9,0}, the mapping being many-one. Equally, 
the set of words (in a dictionary or place-name sense) is mapped into 
a set of sets of digits. The mapping, being many-one, is not useful 
to the telephone company. The object of their exercise is to get a 
subset of words such that the first three letters of the words have a 
one-one mapping into the set of digit triples. For example: HASt>327 
and EALing->325; both these words can go into the required 
subset. But DARtford-—»>327 also; this word must be eliminated in 
defining the subset for the one-one mapping into digit triples. 


7.4. Isomorphism. A main concern of modern mathematics is with 
systems which may have different labels but which cannot be dis- 
tinguished in their relevant properties. In the words of Poincare 
(1854-1912): ‘mathematics is the art of calling different things by 
the same name’. The point can be made with a simple example. Let 
X be a set of temperatures α΄ C. and Y the set of corresponding 
temperatures y° F.; there is a function or mapping: y=32 +24. 
From the point of view of order (<), the sets X and Y are the ‘same’: 
if 2,< 2, for two members of X, then Yi<Y_ in Y. However, if we try 
to add and multiply temperatures, the sets are not algebraically the 
‘same’ at all: if x, =2z, it does not follow that y, =2y,. Hence, X and 
Y are the ‘same’ as long as we do no more than order temperatures. 
They are ‘different’ if we attempt to get sums or products of tem- 
peratures, which is why we do not do this. 

The concept of algebraic sameness or similarity is an important 


182 RELATIONS AND FUNCTIONS [7 


one, and technically it goes by the forbidding name of isomorphism.t 
In making the concept explicit and formal, we may appear to be 
doing no more than stress the obvious. But we find that progress is 
much easier, and more sure, if we are explicit. For sets X and Y in 
which a binary operation + is specified, the definition is: 


DEFINITION: An isomorphism is a one-one mapping XY of a set 


X onto a set Y which preserves the operation «: if x, has image y, and x, 
vmage ψ4 under f, then x, * x, has image ψι * Yo. 
The condition of preservation of * can also be put: if 2,oy, and 
LetrYo, then (x, * X_)+>(y, * Y2). More shortly, for the mapping 
y=f(x): 

f (@1 * Xe) =f (1) * (69) 
each side being y, * y,. The operation + is usually + or x. For sums: 
If x,y, and x,y,, then (%,+2%2)(y, +Yy.); OF 


7, τα) =f (x1) +f (x2). 
The condition is similar for products. 

The term isomorphism refers to the mapping of X onto Y. We can 
then say that Y is the isomorphic image of X or (simply) that X and 
Y are isomorphic sets. Speaking loosely, we imply that isomorphic 
sets are substantially the same, indistinguishable from the point of 
view of the operation in question (though perhaps not for others). 
The sets are denoted differently and have different interpretations, 
but they behave algebraically in the same way. They can be described 
as “equivalent up to an isomorphism’ and, to indicate this, we can 
write X= Y. 

The first example shows that sets can be isomorphic with respect 
to two operations at the same time: 

(i) {1, 2, 3, ...}2 7, ἢ, Z, ...}, both for sums (+) and for products 
(x). One set is J+, the natural numbers or positive integers. The 
other set is a subset of the set FR of all rationals. That they are iso- 
morphic is a consequence of the definition of rationals from integers 
(2.6). The isomorphism is the justification for writing the rational 
i =the integer 3, and for saying that the integers are included in the 
rationals: J C R. From the definition (4.6), the set of finite cardinal 
numbers is also isomorphic with the set J+ of positive integers. Hence, 


t ‘Isomorphism’ is derived from the Greek: isos =equal, and morphe =form. 


4] RELATIONS AND FUNCTIONS 183 


for purposes of sums and products, the positive integer n, the rational 
number n and the cardinal number n are interchangeable. 

The next example illustrates the concept for a relation (that of 
order), as well as for an operation such as + or x ; it also shows that 
we must always be careful to see that there is an isomorphism: 

(ii) X ={1, 2, 3,...}and Y ={2, 4, 6, ...}. A one-one mapping from 
X onto Y is given by y=2z (x a positive integer). The question is: 
what operations or properties does the mapping preserve. It does 
preserve both the property of order (<) and the operation of sum- 
mation (+). Suppose n,2n, and n,«2n,. Then n,<n, implies 
2n,<2n,. Further, n,+7,.2(n,+n,)=2n,+2n, is implied. The 
mapping does not preserve the operation of multiplication ( x ), for: 
if n,<>2n, and n,2n,, then nyn.2n\n,42n, x 2ng. For instance: 
the image of 3 is 6, the image of 4 is 8 and the image of 7 is 14 (adding 
both sides), but the image of 3x 4=12 is 2446 x 8. Hence X= Y for 
order and for sums, but not for products. 

The following example shows that an isomorphism can exist within 
a single set X, i.e. a one-one mapping of X onto itself, preserving an 
operation :* 

(iii) In the set C of complex numbers, the element x + iy can be put 
into one-one correspondence with its conjugate x —iy, thus defining 
a one-one mapping of C onto itself. The mapping preserves addition: 


Image of {(x, Ὁ ἦν.) + (%_+%Yys)} 
=Image of {(5 Ὁ 25) +1 (y; + Ye)} 
= (ας Ἔ 29) —0(Y, + Ys) | 
= (% — ὑψ.) + (ὡς — tY2) 
=Image of (x, τ ἢ...) + Image of (x, + 12,6). 
Similarly, it can be shown to preserve multiplication. Hence, both 
for + and for x, the set of complex numbers is isomorphic with the 
set of their conjugates. | 
Two further examples illustrate that an isomorphism can be with 
respect to one operation in the first set and a different operation in 
the second set, and that the sets need not have numbers as elements: 
(iv) The set J ={n | n an integer} under + is isomorphic with the 
set S = {2” | n an integer} under x. The isomorphism is the one-one 
mapping n<>2”, 1.6. in JS the image of n in J is f(n) =2" in 5. If 
f 


* It is then called an automorphism. 


184 RELATIONS AND FUNCTIONS [7 
1,2" and n,>2", then n,+2.02%+"2 = 2% x 22, So addition of 
n’s in J corresponds in the mapping to multiplication of 2”’s in S. 
The isomorphic sets can be spelled out to show corresponding 
elements: 


111 
... πῷ, πὸ, - wei Zt... =, cy , 4, 8, ... 
{..-8, -2, -1,0,1, 2,3, .J26..5,7,5512 
I ] 
where, for example, — 3 corresponds to 2-3= 3 8° 


The set J of integers under + is isomorphic, not only with the set 
of integral powers of 2, but generally with the set of integral powers 
of any real number under x (7.9 Ex. 15).J under + is also isomorphic 
with other kinds of sets under x, e.g. certain sets of transformations 
with x taken as repeated application (7.9 Ex. 16). 

(v) {(x+ty) |x and y real}~{P | P a point in a plane}, for the 
operation + between complex numbers and the operation of re- 
sultant of vectors OP in a plane. This isomorphism, between the set 
of complex numbers and the set of all points in a plane, follows from 
the development of 2.5. It is the justification for showing complex 
numbers or ordered pairs (x, y) as points on an Argand Diagram. 

The most important applications of isomorphisms are to groups 
and the group operation (usually + or x). A growp isomorphism is a 
one-one mapping of a group G onto a set G, preserving the operation 
* of G. It can be established that G is also a group and that the iso- 
morphism carries over both the identity element and inverses from 
G to G. The proof is: 

Write G ={a, ὃ, δ, ...} and G={a, b,c, ...} where a is the image of 
a, ὃ of ὃ, .... By the isomorphism: a « ὃ is the image of a + ὃ. G is 
closed under « since G is. Since ὦ * (Ὁ κ 6) =(a * ὃ) * c in G, it follows 
that their images a « (Ὁ « 6) and (a « δ) « c are equal in G (associative). 
The identity e of G has an image 6 in G; since e « a=a, then e « a=a 
for the images, i.e. 6 is the identity of G. Finally, the inverse a-1 of G 
has an image (a-) in G; since a— « a=e, the same is true of images: 
(a-1) x a=e, i.e. (a-!) =a— the inverse of a in G. Hence G is a group 
with the properties mentioned. So: 


THEOREM: A group isomorphism carries a group G into another 


4] RELATIONS AND FUNCTIONS 185 


group G as image, carries the identity of G into the identity of G and 
carries an inverse in G into the corresponding inverse in 6. 


It is in this way that one group can be related to, or created from, 
another group. 

Isomorphic groups (or fields) are indistinguishable, from the 
algebraic point of view of the operations concerned. Some of the 
examples above have group isomorphisms; two other examples 
follow: 

(vi) The cyclic group {1, w, w?} of the cube roots of unity is iso- 
morphic with the group of rotations of the equilateral triangle. See 
6.4 (ii) above. The group operator is multiplication ( x ) of complex 
numbers in one case, successive rotations in the other. Each of them 
is also isomorphic with the group {0, 1, 2} (mod 3) under the group 
operation of addition (+). The mapping here is similar to that of 
example (iv) above. The images in {1, ὦ, w?}~{0, 1, 2} are: 10, 
wool, w2>2. Since w®°=1, the exponents of powers on the left are 
numbers on the right. Then any product of elements on the left is the 
image of the corresponding sum on the right; e.g. 


wxw%=uw3=1 istheimageof 1+2=3=0 


using w*=1 and modulo 3 respectively. 


1 


+ Even Odd + 0 
Even Even Odd 0 01 
Odd Odd Even 1 10 

Χ Even Odd x 01 
Even Even Even 0 0 0 

} Odd Even Odd | 1 0 1 


(vii) {0, 1} (mod 2)z {Even, Odd} both for addition (+) and for 
multiplication (x). See 6.5, example (vi). The result follows from 
the addition and multiplication tables shown, together with a speci- 
fication of the mapping: 


O0mEven; 1-QOdd 


which preserves both + and x. This is a double group isomorphism. 


186 RELATIONS AND FUNCTIONS [7 


Indeed, it can be called ἃ field isomorphism, i.e. a one-one mapping 
preserving both + and x, carrying over both identities (0 and 1) and 
both inverses (negatives and reciprocals). In this simplest kind of 
field, of two elements, the zero elements (0, Even) correspond, and 
so do the unity elements (1, Odd). Each is its own inverse. 


x 1 -l 
1 1 - 
-] -Ἰ 1 


Notice that the group {1, —1} under multiplication is isomorphic 
with the group {0, 1} (mod 2) under addition, and similarly with the 
group {Even, Odd} under addition, as shown by the multiplication 
table for {1, -- 1}. The mapping is: 

1<+0 (Even); “-- 1.9] (Odd). 
The set {1, — 1} is not a group under addition and group isomorphism 
does not arise. 


7.5. Linear transformations. In 6.3 and 6.4, we considered trans- 
formations, with the emphasis on the effect of different transforma- 
tions, e.g. r(A), s(A), ¢(A), ..., on the same object A. A set of such 
transformations often has the properties of a group. We now consider 
a given transformation 7, sending an object A into another object (4), 
with attention directed to its effect on different objects A, B, C,.... 

Changing the notation, we write the transformation T from one set 
X into another set Y: X = Y. This means that, if ε X, then there is 


a unique image t(x) € Y under the transformation Τ'. 

A transformation is simply another name for a function or mapping. 
Algebraically, it is y=t(x); y is a function of x. In diagrammatic 
terms, it is the mapping T' of the set X into the set Y. In simple cases, 
the functional formulation is to be preferred to the alternative and 
equivalent expression as a transformation. For example, if X and Y 
are the same (each the set of positive integers), then the transforma- 
tion y=2z is such that each integer z is transformed into the even 
integer 22; the mapping is of points on one line into points on another 
line (7.3). Here, however, the function y=2x over the domain of 
positive integers is the most useful concept. 

The opposite may well be true when X and Y are sets of more 


δ] RELATIONS AND FUNCTIONS 187 


complicated elements. Let X be the set of ordered pairs of real 
numbers (x,, x) and Y another set of pairs (y,, y,). The transforma- 
tion: ¥,=22, and ¥,=2, 1s again quite simple, corresponding to a 
magnification of figures (6.3, example (i) above). It is a mapping of 
points P (x,, x.) in the plane Ox,x, into points Q(y,, ψ4) in the plane 
Oy,y,. Alternatively, it maps P(zx,, x,) into points Q (22, x.) in the 
same plane Oxz,7,. As a function, the transformation is expressed: 
the pair (y,, ¥2) is a function of the pair (2,, 2,); this is not particu- 
larly convenient.* 

More generally, a transformation may map a set of points 
P (x,, ἃς, ... Z,) In m dimensions into a set of points Q(y,, Ys, -.. Ym) 
in m dimensions. In algebraic terms, the transformation can be 
shown as giving each of y;, Yo, ... Ym in terms of the x’s (2, “5, ... Xn). 
This is usually more helpful than the functional form: y=f(x) where 
az stands for the n-tuple (2, x.,...2,) and y for the m-tuple 
(Yi, Yo «+» Ym). The function f(x) is a function of an n-tuple or vector. 
It does not apply to the components separately ; it does not mean that 
Yi =f (1), Y2 =f (@e), .... 

This concept of a transformation is developed here in the simple 
case of a linear transformation in two dimensions. The set X of pairs 
(1, 22) 18 transformed into the set Y of pairs (y,, y,): 

| Y= αγγ + αγοῦς ANA Yo =Agy Hy + Agee «ονννννννννννον (1) 
where the z’s and y’s are real numbers and where the a’s are real 
constants. One example is the simple magnification case of 6.3, i.e. 
y, =ax, and y,=2, for a constant a. More generally, (1) sends points 
(αι, 12) in the plane Ox,x, into points (y,, y,) in the plane Oy,y, in 
such a way that figures may be contracted, expanded and distorted 
in various directions. The only certain fixed point under the trans- 
formation is the origin O(r,=0,2,=0) which remains unchanged 
as O (y,;=0, y,=0) in (1). Two examples illustrate: 

(i) ψι =2, τὰς and y,=32y. 

The square A BCD in Ox,z, has the following points as vertices, and 
the transformation sends them into the points A’B’C'D’ in Oy,y,: 
A (0, 1)-A’(1, 4 C (2, 1)>C’(3, 4 
B(1, 0)—>B’(1, 0) D(1, 2)->D’(8, 1). 


* But it is useful in one particular case, when the number pairs are complex num- 


bers: y, +7y, a8 a function of x, +ix,. Functions of a complex variable are examined 
in 7.6. 


G2 A.B.M. 


188 RELATIONS AND FUNCTIONS [7 


The shape of the square is changed (into a parallelogram) under the 
transformation (Fig. 7.5a). The transformation can be reversed, to 
give ABCD from A’B'C'D’: | 
%=Y,-2y, and ας =2yp. 
2 “2 Dp The transformation (or mapping) is one- 
one and it has an inverse. 
(11) yy =% +a, and y,= 3a, + 32%, 
Start with the same square ABCD in 
Ox,x,. Then: 
Α(0,1).΄ NO ye pry ἃ 
B(1,0) 2 A’, Β΄ (1, ξ 
C(2,1) \v 
D(1,2) 7 | 
H 5 3%, The parallelogram A’B’C’D'’ in Oyy, 
Fic. 7.54 collapses to two points (Fig. 7.68). Indeed, 
all points in Ox,x, map into points on the 
line y,=3y, in Oy,y,. The transformation cannot be reversed; given 
y, and y,, the corresponding values of x, and x, cannot be written. 
The transformation (or mapping) is many-one, without an inverse. 
This leads us to look for the inverse of 
(1). On solving (Appendix A.4): 


AooY1 — ἀγχοῦ. 


C", 5’ (8, 3) 


“1 = 
11499 — Aye, 
a - ἃ 
ΠῚ χ,; = 1152 21Y1 


41929 -- A194 


This is a linear transformation from 
(Yi, Ye) to (x1, 44), provided that: 


411% 99 oi ay ho ΞΕ 0 Ce (2) 


Hence the transformation (1) is one-one and has an inverse, provided 
that the constant coefficients in (1) satisfy the condition (2). This is 
so in example (i). In example (ii), the expression of (2) becomes zero; 
the condition is not satisfied. 

Consider all the different transformations (1), subject to the con- 
ditions (2), i.e. take the set of pairs of equations as the a’s are given 
all possible real numerical values which satisfy (2). The set can easily 
be checked to have the properties of a group under the operation x 


5, 6] RELATIONS AND FUNCTIONS 189 


(successive applications of transformations from the set). The critical 
property is the existence of an inverse for each transformation of the 
set, assured by the condition (2). Linear transformations (1) which 
satisfy (2) are called non-singular. The result is that non-singular 
linear transformations, with coefficients from the field of real num- 
bers, form a group under multiplication. It is an example of what is 
called a full linear group. 


7.6. Conformal transformations. If the number pair (y;, y,) is a 
function of the number pair (z,, z,), then as a transformation each 
of y, and y, is given in terms of x, and x,. The linear case is (1) of 7.5 
above. The expressions for y, and y, are, in general, specified sepa- 
rately and independently; there is no necessary connection between 
them. There are, however, particular cases where the two expressions 
are linked. One way of linking them is by combining 2, and z, into a 
complex number (x, + 1x,) and by transforming into another complex 
number (y, +7y,). The transformation then appears as a function of a 
complex variable. Further, since the two expressions in the trans- 
formation are linked, it is called a conformal transformation. The 
function can still be of any form but, to provide illustrations, only 
two simple cases of polynomial form are considered here. 

The notation is changed so that the complex number from which 
we start is written z=x2+1ty, where x and y are real numbers. The 
transformation is specified by Z=/f(z) where Z= X +7Y is the trans- 
formed version of z=x+ty. On separating the elements of the 
number pairs, X is expressed in terms of x and y, and Y in terms of 
x and y; this is the process of ‘equating real and imaginary parts’. 
The two expressions are linked through the form (here a polynomial) 
adopted for f. 

The simplest case is the linear function of a complex variable: 
Z=az+b where a and ὃ are (real) constants. Hence: 


| AX +0Y =a(x+ty)+b=(ax+6)+1(ay) 
and the conformal transformation is 
A ΞΡ “and YF ΞΟ εἰξολεύν οι ἐν ἀνανονε ξεν (1) 


If the transformation (1) is regarded as a mapping of points (2, y) 
in the plane Oxy into points (X, Y) in the plane OX Y, and if a>0, 
then the mapping is a magnification in the direction Ox and the 


100 RELATIONS AND FUNCTIONS ot 


same magnification in the direction Oy, together with a simple shift 
(by δ) ἴῃ the Ox direction. The conformal nature of the transforma- 
tion appears in the equal magnifications used. For example: 


Z=3z2-4 
or X=4(3¢-1) and Y=3y 


corresponds to a 50 per cent magnification in both directions and a 
horizontal shift to the left (by 4). The square ABCD is blown up to 
the square A’B’C'D’ as shown in Fig. 7.6a. 


Fig. 7.6a Fig. 7.6 


A simple non-linear function of a complex variable is: Z = 42? 
1.6. X+tY =4 (x+y)? =4 (x? -- 8) τὰν 
or AS se =) and. YS νι isnciinstoaniaviaes (2) 


As a mapping of (x, y) in Oxy into (X, Y) in OXY, this transforma- 
tion distorts a figure composed of straight lines into a curvilinear 
figure. The conformal transformation (1) is linear and preserves 
straight lines; the conformal transformation (2) is non-linear and 
lines are sent into curves. Fig. 7.66 illustrates by showing the trans- 
formation of the square ABCD into A’B’C’D’. To assist in the 


6, 7] RELATIONS AND FUNCTIONS 191 


identification, the mid-points KLMN are also shown, transformed 
into K’L'M'N’. For example, CMD are collinear: 

C(2,1) M(3/2, 3/2) D(1, 2). 
To get C’, put x=2, y=1 in (2) and obtain: 

X = (2?-1%)/2=3/2 and Y=2x1l=2. 
Hence, on working out for all three points: 
C’ (3/2, 2) Μ' (0, 9/4) D'(-3/2, 2) 

which are not collinear. The conformal nature of the transformation 
(2) appears in the preservation of a certain symmetry in the figures. 


7.7. Order. In 6.7 ordering is taken first as a primitive concept, 1.e. 
a<b means a precedes ὃ. For a field, in which differences are defined, 
order becomes more precise; it is a property of positiveness: a<6 
means (ὖ -- α) positive. The concept of a relation now provides a 
more general notion of order. 

A preliminary comment on the notation for order is appropriate. 
Though we do, in fact, define an ordering by reference to a relation 
denoted generally by R, we have always in mind that the order is 
according to < (less than) or < (less than or equal to). There are 
alternative and equivalent notations in use in either case. It is a 
matter of choice whether we write a<b (a less than 6) or b>a 
(b greater than a); both notations mean precisely the same thing. 
As long as we know what we are doing, it is a great convenience in 
practice to be able to switch between a<6 and b>a. In the same way, 
we can interchange a<b and b>a at will. In developing basic con- 
cepts, when there are quite enough real distinctions to keep in mind, 
we must avoid this duplication of notation. Here we confine ourselves 
to a<b or a<b. Any relation R of an ordering is to be interpreted 
either as °<’ or as ‘<’. 

S ={a, ὃ, δ, ...} may be ordered in one or other of three forms: 

(i) Complete Ordering. There are two possibilities only: 
eitthera<b or b<a. 
(ii) Partial Ordering. There are three possibilities: 
a<b or a=b or b<a. 


(iii) Weak Ordering. There are four possibilities: 
a<b or a=b or b<a or aand ὃ not comparable. 


192 RELATIONS AND FUNCTIONS [7 


Write F for the relation of the ordering. R is complete if it holds, one 
way or the other, between every pair of elements of S: either aRb or 
δα. Not all orderings have a complete relation, e.g. R is not com- 
plete in a weak ordering. The negation R’ of R is the relation between 
a and ὃ which holds when a and ὃ are not related aRb, i.e. aR’b 
means ~(ahb).* Consider each ordering in turn: 

(i) In a complete ordering, take R as < so that aRb means a<b. 
Then aR’b means ab, 1.6. b<a, the only alternative. The relation 
# is complete, since either aRb (a<b) or bRa (b<a) holds. Equally, 
Μ΄ is complete, since either αἰ’ (b<a) or bR’a (a<b) holds. 

(ii) In a partial ordering, take R as < so that aRb means ax. 
Then αἰ’ means a£b, 1.6. b<a, the only remaining possibility. 
Again R is complete, since either aRb(a<b) or bRa (b<a). But R’ is 
not complete, since aR’b(b<a) and bR’a(a<b) do not exhaust the 
possibilities (ὦ Ξε ὃ not allowed for). 

(iii) In a weak ordering, take R as < again. Then aR’d is also 
an alternative: a£b means either b<a or a and b not comparable. 
Neither R nor 1 is complete. 

The concept of equivalence (7.2) is relevant to ordering. In a 
complete ordering, no elements of S are related by equivalence. In a 
partial or weak ordering, there are equivalent elements in S so that 
S can be partitioned into equivalence classes. Two elements a and ὃ 
of S may belong to the same equivalence class, in which case a=. 
Where the two orderings differ is in the properties of elements not 
in the same equivalence class. For a partial ordering, the equivalence 
classes can themselves be ordered completely. Hence, if a and ὃ are 
not in an equivalence class, then either a<b or b<a. This is not so for 
a weak ordering. The equivalence classes cannot be put in a complete 
order ; there is always the possibility that a and ὃ are not comparable. 
See 7.9 Ex. 26. 


7.8. Properties of order. The three properties of equivalence (7.2) are to 
be examined in the wider context of ordering. One is the transitive pro- 
perty : if ab and b#c, then aRc. This is assumed to hold throughout.t 


* In terms of sets, the relation R is a subset of all ordered pairs (a, b) of the Car- 
tesian product S . S, e.g. those for which a<b. Then R’ is the complementary subset, 
all pairs not related by R, e.g. those for which b<a. 

¢ Though not considered here, non-transitive relations are of considerable interest. 
They may even describe what can be called ordering. | 


. 8] RELATIONS AND FUNCTIONS 193 


Another is the reflexive property: aRa logically true, 1.6. if a 
and 6 are identical, then αἰ. For example, the relation < is re- 
flexive. The opposite property is irreflexive: aRa logically false, 1.6. 
if a Rb, then a and ὃ cannot be identical. For example, the relation < 
is irreflexive. 

The remaining property is concerned with any symmetry which 
exists between aRb and bRa. This can best be regarded as a con- 
sequence of the reflexive and transitive properties assumed for ἢ. 
There are two cases to consider. First, suppose that κα is reflexive 
and transitive. Then aRb and bRa may both hold, i.e. equivalence 
of a and 6 (a=6) is possible. The typical case arises when & is <, 
which is reflexive (a<a) and transitive. It is possible that a<b and 
b<a, meaning a=b. The set S has a subset which is an equivalence 
class. Second, suppose that # is irreflexive and transitive. Then 
aRb and 6Ra cannot both hold. For, if they do, then aRa by the 
transitive property, and this is ruled out by the irreflexive property. 
The typical case is when R is <, which is irreflexive (a<a false) and 
transitive. It is not possible that a<b and b<a both hold. This is the 
anti-symmetric property. 

We can now write formal definitions, and obtain eee for an 
ordered set S = {a, b, c, ...} of elements of any kind: 

(i) Complete Ordering. The set S is completely ordered by Ff if καὶ is 
an irreflexive, transitive and complete relation. Hence ἡ is anti- 
symmetric. The relation R can then be written <. Hence, for any a, 
ὃ and cin S: 


Property ‘In terms of R: With R written <: 
Irreflexive aRa logically false ata 
Anti-symmetric | ifaRb, then ~(bRa) ifa<b, thenbta 
Transitive if aRb and bRc, then aRc | ifa<b and b<c, thena<c 
Complete if ~(aRb), then bRa ifa<«b, then b<a 


The negation R’ has precisely the same properties. This is because 
aR’b means ~(aRb) or bRa. With R written <, aRb is a<b and 
aR’b is b<a; these are the only possibilities. 

(ii) Partial Ordering. The set S is partially ordered by καὶ if R is a 
reflexive, transitive and complete relation. Hence symmetry is 


104 RELATIONS AND FUNCTIONS [7 


possible, i.e. aRb and bRa may both hold (a and ὃ equivalent). The 
relation R can be written <: 


Property In terms of R: With R written <: 


Reflexive aa logically true axa 


Equivalence | if aRb and bRa, then a and | ifa<b and b<a, thena=b 
ὃ are equivalent 

Transitive if aRb and bRc, then aRe | if a<b and b<c, thena<c 

Complete if ~(aRb), then bRa if a£b, then b<a 


The negation FR’ is irreflexive, anti-symmetric and transitive, but not 
complete. Since aR’b means ~(aRb), then, with R written <, aR’b 
means 6<a. The reason why R’ is not complete is that aR’b (b<a) 
and ὁ ἔα (a<b) do not exhaust the possibilities; they omit the case 
a=b, 

(111) Weak Ordering. The set S is weakly ordered by R if R is re- 
flexive and transitive. Again symmetry is possible, i.e. aRb and 
bRa may both hold (a and ὃ equivalent). The relation R can be 
written <: 


Property In terms of R: With Καὶ writien <: 
Reflexive aRa logically true axa 


Equivalence | if afb and δα, then a and| if a<b and b<a, thena=b 
δ are equivalent 

Transitive if aRb and δ Ἐς, thenaRc | ifa<b and b<c, then a<c 

Incomplete | if ~(aRb), then bRa may | if aX<b, then b<a or a and 

or may not hold ὃ are not comparable 


The negation F’ is a relation with alternatives: aR’b means b<a or 
a and ὃ not comparable. R’ is irreflexive, anti-symmetric and transi- 
tive. Neither ἢ nor R’ is complete; there is always the possibility 
a and ὃ not comparable to confuse R and the possibility a=b to 
confuse R’. 

The ordering of a set S = {a, ὃ, c, ...} bears directly on the question 
whether S can be scaled or not, i.e. on the question of the ‘ordinal 
measurement’ of the elements of S. The scaling of S is achieved if 8 
is completely or partially ordered ; no scaling is possible if S is weakly 
ordered. A completely ordered S has no equivalence classes (of more 
than one element) and there is no indifference relation between the 
elements of S in the scaling. A partially ordered S has equivalence 


8, 9] RELATIONS AND FUNCTIONS 195 


classes so that, in scaling S, certain elements must be counted as 
indifferent on the scale. A typical case is the set of rational numbers 
which can be completely scaled if duplication of rationals is eliminated 
but which is scaled with indifference relations if duplication is not 
eliminated. For example, the rationals 3, $, 2, ... form an indifference 
subset in the scaling of the rationals. 

The concept of isomorphism applies to ordered sets. The sets 
X ={21, Xe, Hy, ...$ and Y={y,, Yo, Y3, ---} are isomorphic or 
similarly ordered if there is a one-one mapping 2,Y), %,Y.2, 
%3>Y3, ... which preserves ordering by the relation R: 


ifxv,Rx,, then y,Ry,. 


For a complete ordering, 7,<z, implies y,<y,; for a partial ordering, 
%,<a, implies y,<y,. The same scaling can be applied to similarly 
ordered sets. 


7.9. Exercises 
1. Take X ={1, 2} and Y =({1, 2, 3, 4, 5} and show that the relation 
h={(x,y)|cEeX, yeY, y=x} 


consists of 2 out of the 10 elements of X . Y. Replace X by {Even, Odd} and 
show that X . Y has 10 elements with 5 in the relation R’ given by the state- 
ment yR’ax: ‘y is α΄. Show that the domain of R and of R’ is the whole of X. 

2. If X and Y are each the set of all real numbers, explain why the state- 
ments y=|x|-1 and y=./(z)-1 give the same relation R. Represent R 
graphically. 

3. Illustrate that a variable can be related to a discrete number by consider- 
ing ‘y is the least integer not less than x’ for X ={x | x areal number, 0 «ὦ <4} 
and Y ={1, 2, 3, 4}. Show that X is the domain and Y the range of the relation. 

4. Relations with compound statements. If X and Y each comprise all real 
numbers, the relation R ={(z, y)|xeEX, ye Y, x?+y?=1, x>0} is specified 
by the conjunction of two statements. Compare with 

R’={(a,y)|xEX, ye Y, x?+y%=]} 
where X is all positive, and Y all real numbers. In what sense are R and R’ 
the same? Represent graphically as in Fig. 7.1b. 

5. In the tribe of 4.1, example (iii), X is the set of 4 upper-class males and Y 
the set of 3 lower-class males. A relation R is defined by ‘y is of the same 
generation as x’. Show that & comprises 5 out of 12 pairs in Α΄. Y. 

6. Show that the relations R of Exs. 1 and 2 both give y as a function of 2; 
and that the relations R’ of Exs. 1 and 4 do not give y as a function of x but do 
give x as a function of y. Show that the correspondence is two-two in the 
relation of Ex. 5 and that no function is defined. 


196 RELATIONS AND FUNOTIONS 7 


7. Step-functions. The relation ‘y is the least integer not less than 2’ is de- 
fined on the domain X of all positive real numbers. Show that y is a function 
of x with the property that one and the same value of y is given for all x in 
0<x< 1, another single value of y for all z in 1 <x < 2, and so on. Show graphi- 
cally why this can be described as a step-function. 

8. Show that ‘a and y have the same father’ is an equivalent relation in any 
set of people. Consider the relation ‘y is the brother of x’, showing that it is 
symmetric (but not reflexive) in a set of men but that it fails even to be sym- 
metric in a set comprising men and women. 

9. It might be argued that the reflexive condition is not needed in the 
definition of an equivalent relation R: if zRy, then yRx (symmetry); but, if 
xRy and yRzx, then «Rx (transitivity); i.e. xRx follows and need not be speci- 
fied. But this means xRzx 1} there is a y such that xRy, not xRzx for all x. Illus- 
trate by reference to the relation ‘x and y are both even’ in the set of all integers. 

* 10. Circular relations. The condition: if zRy and yRz, then xRz, defines a 
circular relation. Show that R is an equivalence relation if and only if it is both 
reflexive and circular. Illustrate with ‘x and y have the same father’. 

11. Prove the converse of the partitioning theorem of 7.2: if S is partitioned 
into subsets S,, then an equivalence relation yRx is defined in S. To prove, 
take yRz as ‘x and y belong to the same subset S,,’ and establish that F satisfies 
the conditions for an equivalence relation. 

12. Show that the statement y = 2x gives a mapping of X ={1, 2, 3, ...} unto 
itself but a mapping of X ={x | x a real number} onio itself. In illustrating the 
importance of specifying the domain of a function (mapping), indicate that the 
difference here is due to the fact that X is a field in the second (continuous) 
case, but not in the first (discrete) case. 

* 13. Partition the set J of integers (a group under +) into ‘equivalence 
classes J, (r =0, 1, 2, ... m -- 1) by the relation ‘x and y have the same remainder 
on division by n’. Show that the set of canonical forms is the set of integers 
(mod n). The J,’s are residue classes (3.9 Ex. 18). Show also that J, is a sub- 
group of J but not the others. The J,’s are cosets in the group J (6.9 Ex. 12). 

* 14. Continuing, show that the set of residue classes J, is isomorphic with 
the set of integers (mod ἢ), preserving both + and x, and deduce that the 
J,’s make up a field if n is prime. For this, define: 


J,+J,=set of sums of an element of J, and an element of J, 


and similarly for J, x J,. Then show that J, Ἐ“, Ξε ῦς ει and J,J, Ξ ὅτ where 
(r +8) and (rs) are sum and product respectively of r and s (mod n). 

15. Show that {n | an integer} under + is isomorphic with {«"|n an 
integer} under x, for any real «. Deduce that 10° under x behaves like x 
under +, for an integer (and more generally, see Chapter 12). This is the 
basis of logarithms. 

16. Group of translations. Consider the group n(A) of translations, i.e. a 
shift of n to the right, where n is an integer (6.4). Show that this group under 

x is isomorphic with the group J of integers under + 


9] RELATIONS AND FUNCTIONS 197 


17. Show that {0, 1, 2, 3, 4} (mod ve {0, 2, 4, 6, 8} (mod 10) under + but 
not under x. 
18. Cyclic groups. Show that {a, a?, a3, ... a} where a” =1 (identity) under 
x 1s isomorphic with the set of integers (mod nm) under +. 
19. Automorphism. If a and ὃ are any rationals, show that 


{a +b./2}= {a — b./2}, 


preserving both sums and products. This is an automorphism, a one-one 
mapping of the field R(,/2) onto itself. 

* 20. Another derivation of complex numbers. Show that the field of poly- 
nomials, mod x?+1, is isomorphic with the set a+bx, where x?= —1, pre- 
serving both + and x. (Here the coefficients of the polynomials and the pair 
(a, δ) are to be taken as real numbers.) See 3.9 Ex. 19 and 20. Deduce that the 
field of complex numbers can be defined from the integral domain of poly- 
nomials f(x) over the field of real numbers, by taking remainders on division of 
J (x) by x? +1 and by interpreting x as 7 (1? = — 1). 

* 21. Homomorphism. Modify the definition of isomorphism (7.4): a 
homomorphism is a many-one mapping X pat preserving the operation *, 


F(x, * %,) =F (x,) * F(x,). An isomorphism is then a particular case of the 
more general mapping of a homomorphism. Illustrate by showing that there is 
a homomorphism between the group J of integers under + and the cyclic 
group {1,2, —1, —7} under x, the many-one mapping being n->i". Contrast 
the set {i” | an integer} with the set {«” | n an integer} which is isomorphic 
with J for « real (Ex. 15). 

22. Show that y, =3a, and y, =x, — x, has inverse x, =3y, and x, =2y, —Y., 
and that it sends the square of Fig. 7.5a into a parallelogram. 

23. Compare the transformation y, =@1,2, +@.%_ +b, Yo τς, +Age%, +d, 
with (1) of 7.5, showing that it has the same effect on a figure in the plane 
Ox,x,, except that it sends O into O’(b,, b,) in Oy,yz, i.e. it includes a shift or 
change of origin. If the transformation is non-singular (a,,d99 — G92, #0) 
show that the inverse exists: 


Qee(Y1 — 01) — Gia (Yo — be) ΓΞ A11 (Yo — Oe) - αφιίψι -- δ.) ; 
αχγᾶςς — ἀχοῦ 21 A112eo — Ayo, 


Ly = 


24. Affine group. Consider the set of non-singular transformations of Ex. 23, 
for all real a’s and b’s (41,022 — @12%2,;# 0). The subset for which 6, τεῦς =0 is a 
group (the full linear group of 7.5). Show that the subset for which a,, =a,, =1 
and α12 Ξεῶςι =0 is also a group, the group of translations (Ex. 16). Then show 
that the complete set is a group which is not commutative. See 6.3, example 
(i). The complete group is called an affine group. 


25. The function Z = of a complex variable gives a conformal transforma- 


tion, mapping (x, y) into (X, Y) by X =2/(a* +y?) and Y = —y/(a* +y?). Trace 
the transformation of the figure ABCD of Fig. 7.6b in this case. 


198 RELATIONS AND FUNCTIONS [7 


26. Weak ordering. Order a set S ={a, ὃ, c, ...} according to the relation 
ab: ‘T like ὃ at least as much as a’. Show that there are four possibilities for a 
pair a and δ: I like a and ὃ equally (indifference, aRb and δα both true), or I 
like ὃ better (ὃ preferred, aRb true but not bRa), or I like a better (a preferred, 
bka true but not aRb), or I cannot choose (non-comparable, neither afb nor 
bRa true). Deduce that R gives a weak ordering of S. 

27. The relation R is reflexive in the set S ={a, ὃ, c, ...}. Show that 


(aRb | bRa) 


is a relation aRb between a and ὃ and that R is an equivalence relation in 8S. 
Deduce that R serves to partition a partially ordered set S into equivalence 
classes, i.e. separating into classes the equal (a=b) members of S. See the 
second property of partial ordering in 7.8. 


CHAPTER 8 


GEOMETRIES 


8.1. Various geometries. The properties of figures in space are the 
concern of geometry and trigonometry.* We deal here mainly with 
‘plane geometry’ in two dimensions. More generally, we need to 
visualise figures in three dimensions and to imagine them in more 
than three dimensions. A vast amount of material has accumulated 
in geometry and, without referring to more than a small fraction, 
we can attempt to provide a logical basis for it all. | 

A first distinction is on the method of treatment in geometry. This 
can be synthetic in the sense of logical deduction from geometric 
axioms on the lines of the famous system of Euclid (circa 300 B.c.). 
It can be analytic in the sense that points are associated with number 
pairs, with geometric properties translated into algebraic relations. 
There is then a unification of geometry and algebra or analysis; 
which swallows up the other is a matter for argument. 

Another distinction is concerned with the content of geometry, with 
the kind of properties to be established. Elementary geometry deals 
with the metric aspect of figures; it is designed to apply to the 
everyday world of distance, angle and area. The idea here is that the 
figures of metric geometry are unchanged in shape by certain trans- 
formations which may be described as ‘rigid motions’, i.e. transla- 
tions, rotations and reflections. It is not necessary that the figures 
should actually be ‘moved’. A reflection, for example, is the trans- 
formation corresponding to looking at a figure in a mirror. Moreover, 
translations and rotations can be expressed in terms of two or more 
reflections (8.9 Ex. 1). Hence in metric geometry, figures are invariant 
under certain transformations. Again geometry is linked to algebra, 
and in particular to groups of transformations. A vast extension of 

* Geometry’ is derived from the Greek: ge =earth, and metria =measuring; simi- 


larly, for measurement of triangles, ‘trigonometry’ comes from: tri =three, gonia = 
angle, and metria = measuring. 


200 GEOMETRIES [8 


geometry is made by following up the question: what properties of 
figures are invariant under this or that group of transformations? 
In this way, we are led to such subjects as projective geometry (8.7 
below). 

A third distinction gets down to fundamentals, the axiomatic basis 
of geometry. We may think that, in metric geometry, we are talking 
about actual space, about actual points and lines. This is not so. 
We postulate abstract concepts of points and lines and develop 
abstract properties of these abstractions. In applications to everyday 
life, we must remember that points and lines on paper (and still 
more on the surface of the earth) are approximate realisations of 
abstract points and lines. The question is: what axiomatic basis 1s 
appropriate? The idea that there is only one set of axioms and only 
one geometry has been in discard since Einstein developed his 
famous theory of relativity. It can no longer be said that Euclid’s 
axioms are correct and all others wrong, or even a waste of time. We 
can safely consider all kinds of axioms, and hence various geometries, 
Euclidean and non-Euclidean, both as interesting academic exer- 
cises and with the thought that they may even have practical 
uses. 

Is the axiomatic basis of geometry to be given in geometric or in 
algebraic terms? Until late in the nineteenth century, all respectable 
geometers followed a purely geometric formulation for metric and 
projective geometry alike, often going far out of their way to do so. 
The algebraic or analytic treatment was played down very severely ; 
it was useful for illustration and (sometimes) for getting at otherwise 
difficult proofs. More recently, a better balance has been struck 
between the purely geometric and the algebraic treatments. There 
is indeed a good case to be made out for an algebraic basis of all 
geometries, for geometry to be absorbed into algebra. 

We have already used the link between algebra and geometry, 
admittedly only for illustrative and graphical purposes. There is an 
isomorphism between a set of number pairs (x, y) and a set of complex 
numbers (x +iy), a8 a matter of straight algebra. However, either 
can be associated (made isomorphic) with a set of points P, or with a 
set of vectors OP, in a plane. We need to get the geometric aspect 
here onto a firmer basis. There is clearly some risk of confusion, with 
four different sets isomorphic one with another. We find that the 


1, 2] GEOMETRIES 201 


vector is the appropriate concept to take as a basis, to be related to a 
point P and a number pair (x, y). Complex numbers and their repre- 
sentation on an Argand Diagram are best left on one side, as a useful 
but subsidiary application. 


8.2. Metric geometry and vectors. We take for granted the main 
results of mensuration in elementary geometry and trigonometry 
(Appendix A.7 and A.8). 

One feature of metric geometry is that metric properties are 
invariant under the group of transformations of ‘rigid motions’, i.e. 
translations, rotations and reflections. If one of these transformations 
is applied to the vertices of a triangle A BC, sending A into A’, B into 
B' and C into C’, then the triangle A’B'C’ is congruent with the 
triangle ABC. 

Another feature is that there is a very considerable amount of 
duality between points and lines.* The incidence of a point P and a 
line p means both that P lies on Ὁ and that p goes through P. Further, 
given two points P and Q, a unique line r joins P and Q, defining a 
unique segment or length PQ along r. The dual result is: given two 
lines p and 4, a unique point F# is obtained as the intersection of p 
and q, defining a unique angle (p, 4) between p and 4. A line is the 
dual of a point and an angle is the dual of a segment. A triangle is 
specified either by three points A BC or (as the dual) by three lines abe. 

There is one troublesome exception to this duality. Two points P 
and @ always define a line r. On the other hand, two lines p and q 
define a point R, except where p and q are parallel. It is clearly worth 
while to eliminate this nuisance; this will be done later. There is also 
a difficulty in definition, shown up in the duality between points and 
lines. In defining the line r joining two given points P and Q (and the 
length of the segment PQ), how do we know that we go from P to Q 
to get the length PQ? We might start from P, go the wrong way and 
never get to ᾧ. Or, to put the matter another way, what points on r 
are ‘between’ P and Q? Now look at the same difficulty in dual form. 
In defining the point F of intersection of two given lines p and q, we 
say we have a unique angle (p, 4) between the lines. But which angle? 
There are four of them and (even with opposite angles rated as equal) 
two different ones. Or, how do we know which lines through R are 


* Whenever there is no risk of ambiguity, we use ‘line’ for ‘straight line’. 


202 GEOMETRIES [8 


‘between’ p and ᾳ to make the angle (p,q)? Fig. 8.2a illustrates. 

Clearly this concept of ‘betweenness’ is implicit in metric space, just 
as ‘order’ is in the field of real numbers. 

The way out of the difficulty is to introduce the idea of a vector, a 

concept uniting points and lines and embracing both length and 

direction (angle). A vector has both 

q given length and given direction but 

P ee ᾿ p With no fixed position; it refers to the 


-" relative position of two points. In 
q hort, a vector is the relative and 

P a Q a? directed distance between two points 
ὰ P in space.* It is unchanged if the 

Fia. 8.2a points are shifted (translated) in the 


same way in space, 
Hence, the vector PQ is the segment of line from P to Q, having 
both length (p) and direction (from P to Q). Given P and Q, we now 
know which way we are going, which points are between P and Q. 
So we get the length p from P to Q in the vector PQ. The vector QP 
is different: the same length p but in the opposite direction from ᾧ 
to P. We write QP = — PQ, one vector the negative of the other. 
Similarly given two lines p and q intersecting in ἢ, specify them as 
vectors from P with given directions. Then we have a specified 
(unique) angle «° for the angle (p, 4), that measured anti-clockwise 
from the (directed) p to the (directed) g. Fig. 8.2a illustrates. 
In the end, we find that vectors are basic for _— ‘space’ and 
that they are a strictly algebraic 
concept. We can build the geo- 


metry of space on a firm algebraic 


foundation. To do this, we first 
look for the properties we wish 
vectors to have, and we then use 
them, suitable abstracted, as the Pe ae “- 
axiomatic basis for defining vec- Ρ “τ 
tor space. The properties (illus- 
trated in Fig. 8.20) are: a 

* ‘Vector’ is a Latin word: vector, -oris =carrier, from veho, veri, vectum =carry. 


Hence the original idea of a vector is a displacement, carrying one point to another, 
by a certain distance in a certain direction. 


2] τ GEOMETRIES 208 


(i) Equivalence: PQ and P’Q’ are equivalent (equal) if they have the 
same length and direction. So opposite sides of a parallelogram give 
᾿ς equivalent vectors. This expresses the idea of a vector as a directed 
distance but with no fixed position. 

(ii) Addition: if PQ is the diagonal of the parallelogram PRQS 
formed from the vectors PR and PS, then PQ is the sum of PR and 
PS: 

PQ=PR+ PS. 


The sum is the resultant of the vectors, an idea arising in mechanics, 
e.g. for the sum or resultant of forces. It is also the way in which 
complex numbers are summed on an Argand Diagram. The corre- 
sponding process of subtraction follows; the sum PQ=PR+PS 
gives two differences: 
PQ-PR=PS and PQ-PS=PR. 

The zero vector is PP, of no length. The negative (~ PR) is the vector 
PR’ such that PR+PR’'=PP and P is the mid-point of RR’. (If 
the parallelogram PRQS has PR=PS and if the angle between PR 
and PS opens out to 180°, then the resultant PQ collapses on P.) 
Hence the negative ( -- PR) is a vector of the same length as PR but 
reversed in direction. It is easily checked (8.9 Ex. 4) that the 
difference: 

PS=PQ-PR=PQ+(-PR) 
and similarly that PR=PQ—- PS=PQ+(- PS). 

(il) Multiplication by a scalar: if k is a real number (scalar), then 
PR=kPQ is a vector obtained as follows. If k>0, PR is in the same 
direction as PQ but of k times the length, 1.6. a contraction of 
PQ(k<1), PQ itself (k=1), or an expansion of PQ (k>1) as in Fig. 
8.26. If k=0, the vector of zero length is obtained. If k<0, write 
k= —« where x>0. Then «PQ is a contraction or expansion of PQ. 
So PR=kPQ = -- («PQ) is the same contraction or expansion, but in 
the opposite direction. In particular, if k= —1, PR=-— PQ is the 
vector PQ reversed in direction. Hence, multiplication by k contracts 
or expands a vector, in the same direction (k>0) or with direction 
reversed (k<0). 

All this is in part familiar and in part strange. The familiar aspect 
is that vectors form an additive group, a commutative group under 
the operation of addition here defined (8.9 Ex. 5). It is commutative 


204 GEOMETRIES [8 


since PQ is either PR + PS or PS + PR from the parallelogram. The 
new aspect is the operation of scalar multiplication for expansion or 
contraction. This is not multiplication of vectors; we have said 
nothing, and need say nothing, about the product of two vectors.* 


8.3. Vector spaces. We have now the idea of a vector in two dimen- 
sions, something with length and direction. In generalising, we 
abandon the limitation to two dimensions and, at first, we leave over 
the features of length and direction as needing more precise definition. 
These features may be thought of as ‘purely geometric’ concepts but, 
in fact, they are as yet no more than visual impressions. We find, 
perhaps to our surprise, that geometric ‘space’ in a very general 
sense can be specified in terms of vectors, without reference to length 
and direction. Moreover, the definition is entirely algebraic, an ex- 
tension of the concept of a group. 

We start with a set V of elements u, v, w, ... which we propose to 
call vectors. The vectors of V are entities of various kinds, not at all 
necessarily numbers. They may be single numbers and quite often 
they are number pairs (or n-tuples); but they can be things without 
reference to numbers at all. The essential requirement is that an 
operation of addition (+) is defined for vectors so that V is an 
additive group, i.e. V has all the properties (including the com- 
mutative one) of a group under addition. Addition can be of the 
nature of a ‘resultant’ of two vectors but we leave it open. Being an 
additive group, V includes a zero vector which we denote by 0 as 
usual, 

An additive group was the starting point in defining a field (6.5). 
In that case, we created a system of double composition by adding a 
second group operation (multiplication). We now proceed on a dif- 
ferent tack, still aiming to make V a system of double composition. 
We supplement the additive group V by a second operation, but this 


* In general there is no question of writing the product of two vectors as another 
vector; a vector space is not a field with an operation x as well as +. There is an 
exception: a vector written as a number pair (x, y) in two dimensions can be inter- 
preted as a complex number (z +7y), represented on an Argand Diagram and subject 
to multiplication by the rule of 2.5. In short, a vector space in two dumensions can be 
identified as a field of complex numbers when we wish to multiply the vectors. There is 
no such identification in three or more dimensions. This is why we said in 8.1 that 
complex numbers are best left on one side in dealing with vectors; they apply only to 
the special case of two dimensions. 


3] GEOMETRIES | 205 


is now a different one: scalar multiplication. V will not be a field like 
real numbers but something basically different. The reason for the 
difference is that, for scalar product, we need to lay our hands on an 
outside set of elements, F’={a, b,c, ...}, which we propose to call 
scalars. F is both a distinct set and a set of different nature from V. 
We require F to be a field under its own operations of + and x, and 
typically F is a field of numbers. Here we take F as the field of real 
numbers. Scalars are then real numbers, taken from the field F of all 
real numbers, and quite outside the set V of vectors. 

Hence the second operation in V is scalar multiplication: given an 
element wu of V, select a scalar a from F and form the scalar product 
au as another vector. So one vector u of V gives another vector au 
of V on multiplication by a scalar a from F. With an eye on the two- 
dimensional representation of 8.2, we specify the properties we 
require of scalar products. There are four of them: 


The Operational Rules of Scalar Multiplication 
For the set V = {u, v, w, ...} of vectors multiplied by scalars from 


BSH S0503.6y τοῦ. 
Rule Scalar products 
Sl. Closure au belongs to V 


S2. Unit scalar lu=u 

S3. Associative a (bu) =(ab)u 

S4. Distributive a(u+v) =au +av and 
(a +b)u=au + bu 


We start with closure: a vector times a scalar always gives a vector 
(S1). Then the particular function of the unit scalar 1 is laid down: 
multiplication by 1 leaves a vector unchanged (S2). The associative 
rule (53) is that, in multiplying by two scalars, it doesn’t matter 
whether we multiply in two stages or whether we first multiply the 
scalars and then apply the product direct to the vector. The distri- 
butive rule (S4) has two parts; one shows the distribution of one 
scalar over the sum of two vectors, and the other the distribution of 
the sum of two scalars over one vector. 

These rules are so framed that they form an ‘economical’ list of 
properties. They can be specified as the axioms for scalar products. 
They are complete, consistent and independent of each other. In this 


206 GEOMETRIES [8 


matter, two observations are appropriate. First, the commutative 
rule does not arise here; it is not that it fails to hold but rather that 
we do not need it. We are operating in V and, in doing so, we dip 
outside for a scalar a to multiply u to give au. We are not operating 
in Ff; and so, given a in F, we never dip outside for u to give ua in F. 
Secondly, the zero scalar 0 in F is related to the zero vector 0 in V: 
0u=—0. Here, we write the same symbol 0 for two different zero 
elements, one in F and one in V. This simplifies matters and the 
context makes it clear always which zero is which. Further, the 
scalar —1 in F is related to negative vectors in V: (—1)u= —wu. As 
operational rules both 0w=0 and (—1)w= —wu may be added to 32 
above. They can, however, be derived from the other rules (8.8 Ex. 
6 and 8). | 
Pulling together the threads of the discussion, we define: 


DEFINITION: The set V = {u, v, w, ...} 18. a vector space over the field 
F = {a, b,c, ...} of scalars, 1} the two operations of addition of vectors 
and of multiplication of a vector by a scalar are defined in V so that V is 
a commutative group under addition and so that scalar products satisfy 
the properties 81, 82, S3 and S4 above. 


A field is a specialised set, satisfying a long list of operational rules for 
sums and products; typical cases are the rational, real or complex 
numbers. A vector space is also a specialised set, satisfying a long list 
of operational rules for sums and scalar products. The two lists in 
part overlap (for sums) but they are also different (products and 
scalar products differ in their nature). 

As a matter of attaching a label, we now say that the general 
concept of ‘space’ in geometry is any set of vectors V with these 
specialised properties. We define ‘vector space’ algebraically, and 
call it ‘geometric space’. We no longer have the kind of set typically 
represented by numbers. But we do have a link with numbers since 
the set V of vectors is defined over the field F of numbers. It is this 
link which enables us to use numbers, in the form of co-ordinates, in 
geometry. 

Vector space is a very abstract concept. The question remains: 
how do we distinguish one space from another, with particular 
reference to their possible applications? The answer turns on what 
additional properties we allocate to vector space, and in particular 


3, 4] GEOMETRIES 207 


what concept of ‘distance’ in space we choose to define. Various 
Euclidean and non-Euclidean spaces are obtained by varying the 
concept of distance used in the general vector space. We have seen 
to it that our abstract vector space has all the required properties of 
the two-dimensional vectors of 8.2, except for length and direction. 
These latter are still to be specified. 


8.4. Euclidean space. A particular case of Euclidean space, that 
appropriate to the two dimensions of the plane, is here developed. 
Once this is done, the extension to more than two dimensions is 
easily achieved. Euclidean space is the general vector space V of 
8.3 with two specific properties added. One is that each vector of V 
is represented by an ordered n-tuple of real numbers, drawn from 
the same field F as the scalars. Here, with n=2 (two dimensions), 
take a vector as an ordered pair (2, y) of real numbers. Specify the 
operations of addition and scalar multiplication: 


Deriniti0on: The product of (x, y) by the scalar k is (ka, ky) and the 
sum of 


(αι, ψι) and (18, Yo) 8 (Ly 4, Ya tYo) «νννννννννον (1) 

These are obvious interpretations of the two operations. It is easily 

checked that the vectors form an additive group and satisfy the 

rules S1-S4 of 8.3. The additive group has a zero, the vector (0, 0). 
The other property is the definition of distance and angle: 


Derinirion: If (x,, y,) and (2, yz) are two vectors of the space V, the 
distance d and angle 6° between them are given by the scalars: 


ie =e a? dnd Oa εἰ ὃ 3 
να, -- 2) + (Yi -- Ye) Je2+y2Vx2+y,2 (2) 


Note that d is the positive square root of a positive scalar. In giving 
the angle 6°, the scalar shown measures an angle in much the same 
way as a thermometer reading measures a temperature. With various 
values of the scalar are associated corresponding measures of angle 
in the unit called ‘degrees’. Two specific ‘scaling points’ are needed: 
the zero angle 0° for which the scalar is 1, and the right-angle 90° for 
which the scalar is 0. The scalar is to be identified as the trigonometric 
ratio (cosine) and it is so written in (2); later we must make sure that 
the notation cos θ is justified. 


908 GEOMETRIES [8 


There is no reference yet to any application to physical space, e.g. 
the plane of this paper or the surface of the earth. We have to attach 
appropriate labels and to check that the abstract properties corre- 
spond with physical properties (suitably idealised). The vector (x, y) 
can be called the point P and the whole set of vectors V, as various 
selections of real numbers x and y are made, makes up the plane of 
points P. The zero vector (0, 0) is labelled the origin O, the point in 
the plane from which distances can be conveniently measured. 
Alternatively, the (algebraic) vector (x, y) can be called the geometric 
vector OP from O to P, in line with the idea of a vector discussed in 
8.2. 

The rules (1) for scalar products and sums can be re-interpreted. 
If P is (x, y), write Q (kz, ky) as the product of P by the positive 
scalar k. By (2), the angle between P and Q is 0° and the distance 
OQ is k times the distance OP. This is what we mean when we say 
that Q is on the line through O and P. Ὁ is between O and P(k<1), 
at P(k=1) or beyond P(k>1); the last is illustrated in Fig. 8.4a. 
For negative k, Q still lies on the line OP 
but on the opposite side of O from P; 
(kx,ky) [Προ distance OQ is | k| times the distance 


Q 


OQ=kOP OP. 
(x,y) Let P, be (x1, ψι) and P,(x», y2). The 
O sum is the point P(x, y) such that — 
P(x) oe απο τ ας δα y=Yy,4+Y2. As a definition, 
= i say that OP,PP, is a parallelogram, the 


I, (a, y,) sum OP being the diagonal, obtained 
P(x,+x,,9,+9,) from the sides OP, and OP,. The (geo- 
A Pr (29,92) metric) vector sum OP=OP,+0OP, 1s 
--΄ the resultant of the separate (geometric) 
vectors OP, and ΟΡ... Fig. 8.4a illustrates. 
Similarly, the difference P’ (x’, y’) between 
P'(x.-x1,92-)) P,and P,has α΄ τεῦς -- ὦ and y’=y,-91.- 
Fia. 8.4a Then the (geometric) vector difference 
OP’ =OP,-OP, means that OP, is the 

resultant of OP, and OP’. 
So far, all geometric vectors are from the fixed point O to any 
point P. We can agree, however, to call the vector from P, to P, as 
the same as the difference vector OP’ =OP, -- OP,.' They are opposite 


4] GEOMETRIES 209 


sides of a parallelogram. We can then use the familiar triangle of 
vectors (Fig. 8.46). We have: | 


OP,=OP,+OP'=OP,+P,P, and P,P,=OP'=OP,-OP,. 


We can read vectors around the triangle of Fig. 8.48, e.g. OP, is the 
sum of OP, and P,P,, or P,P, is the difference OP, less OP,. 

We are now ready to map the algebraic vec- 
tors (x, y) onto a set of points P in an abstract 
plane, with familiar applications to a physical 
plane. The origin O is inserted as a starting 
point, and then two other points, A (1, 0) and 
B (0, 1), to serve as measuring rods for dis- 
tances. By (2), each of A and Bis unit distance Ὁ Fig. 8.46 
from O. By (2), also, the angle between A and 
B (or between the geometric vectors 0.A and OB) is «° where cos α 
=0. The angle is 90°; OA and OB are perpendicular. The con- 
ventional siting of A and B is to take OA horizontal (A unit distance 
to the right of O) and OB vertical (B unit distance above 0) as shown 
in Fig. 8.4c. The set of points M(x, 0) for real x defines a directed 
line or axis Ox passing through O and A. For, by (1), (#, 0) is the 
scalar x times (1, 0); by (2), the distance OM is x and the angle 
between A and M is 0°. The points M (zx, 0) make up an ordered set on 
the directed line Ox, to the right of O(2>0) or to the left (x<0), in- 

7 creasing in distance from O as | x | increases. 

Similarly, the set of points N (0, y) for real y 

defines an axis Oy through B. The axes are 
perpendicular. 

For points P (xz, y), neither x nor y zero, 
consider first the case of positive values (2>0, 
y>0) to get what we know as the positive 
quadrant of the plane. If M is (x, 0) and N 
(0, y), then OM and ON sum to OP where P is 
(x, y), 1.6. OP is the diagonal of the parallelo- 
gram formed by OM and ON. But OM and ON are at right angles 

and the parallelogram is what we know as a rectangle. Hence to 
locate P (x,y): take M a distance x along Oz, N a distance y along Oy, 
complete the rectangle and get Pas the corner opposite O. P is located 
by means of ‘co-ordinates’ (x, y) and with reference to axes Ox and Oy. 


210 GEOMETRIES [8 


Further details elaborate the picture. The geometric vector OP 
can be said to have length p, where p is the distance from O (0, 0) to 
P (x, y). Then p=V2? + y? by (2). The vector also has direction, given 
by the angle «° which OP makes with Oz. Define trigonometric 
ratios from the triangle OPM: 


ae OM x x 
ae 
OP pat + y? (3) 
" MP ᾿ Πρ -- 
SSS eS SS SS 
OP p Ja? + y? 


where OM, OP and MP are distances and where UY P is 


Jie — a)? + (y - 0)? =y 
by (2). But the angle «° is that between OP and OA, i.e. between 
vectors (ὦ, y) and (1,0), so that (2) gives cos «=2/,/(z?+y?). We 
have agreement; the trigonometric ratios tie in with the basic 
property (2) of angles between vectors. The notation (2) is justified. 
The results (3) can be expressed most conveniently : 


L=pCOSa ANd Y=PSIM α .....ννννννννονεννον (4) 


giving the ‘co-ordinates’ x and y in terms of the length p of the vector 
OP and the angle «° it makes with Ox (Fig. 8.4c). P can be located, 
either by specifying x and y, or by specifying p and «. The pair (x, y) 
are the Cartesian co-ordinates of P, named after Descartes (1596— 
1650), and the pair (p, «) are the polar co-ordinates of P. They are 
related by (4). 

The extension to the whole set of vectors V and to the whole plane 
Oxy is a matter of allowing for negative values of x and y in (2, y). 
See 8.9 Ex. 14-16. This involves the negative of the vector OP 
(multiplication by the scalar — 1) as a vector of the same length but 
opposite in direction. Trigonometric ratios need to be extended to 
apply to angles which are not acute in such a way that the relations 
(4) are preserved (Appendix A.7 and A.9). In the end, the variation 
of P over the whole plane corresponds to the set of vectors (x, y) for 
any real x and y. An ordering is involved, i.e. the ordered set of 
points (x, 0) on the axis Ox and the similar ordered set of points 
(0, y) on the other axis Oy. The ordering is precisely that of the field 
_ of real numbers. 


4, δ] GEOMETRIES 211 


Euclidean space, in the two dimensions of ‘plane geometry’, is 
defined algebraically by writing a vector (x, y) to satisfy the require- 
ments of a vector space over the field # of real numbers, and by 
taking addition, scalar products, distances and angles as defined by 
(1) and (2) above. The extension to any number of dimensions is 
- immediate. In three dimensions, for application to ‘solid geometry’, 
vectors are taken as triples (x, y, z), where x, y and z are real numbers 
from 1΄. The definitions (1) and (2) only need appropriate extension 
to allow for three terms instead of two. Points, vectors, distances and 
~ angles in three dimensions are referred to three axes Ox, Oy and Oz 
mutually at right angles. Further, in abstract Euclidean space of ἡ 
dimensions, vectors are n-tuples (x, 22» ... %,) and there are n axes 
mutually at right angles. The algebraic development is perfectly 
capable of supporting n dimensions, even when x>3; the difficulty 
is to visualise the resulting points and vectors. 

In general, a vector space of n-tuples (x1, 7, ... %,) over a field F 
is denoted by V,(/’). It depends both on the number n of dimensions 
and on the field, e.g. of real numbers, used for co-ordinates and 
scalars alike. If appropriate definitions of distance and angle are 
added, Κ΄, (1) becomes Euclidean space of n dimensions, denoted by 
E,,(F). The case examined here is H#,(/) over the field F of real 
numbers.* 


8.5. Non-Euclidean spaces. Two properties of Euclidean space are 
particularly to be noticed. Consider the triangle OP,P, of Fig. 8.46 
above, where the vector OP, is the sum of the vectors OP, and P,P,. 
Denote vector lengths: 


| OP |=length of OP =,/(x*+y?). 
where P(x, y) is the vector. Then, from the vector sum 
OP,=OP,+ P,;P,, 
it follows: 
|OP,|<|OP,| + | PiPs |- 


* An attempt is sometimes made to generalise Euclidean space, by refraining from 
specifying that the vectors are n-tuples of scalars from F, and by replacing the 
definition of distance and angle, (2) above, by certain scalars called ‘inner products’ 
with appropriate properties. The Euclidean space so obtained turns out to be isometric 
(i.e. isomorphic, preserving distance and angle) with the space E,(F') here obtained 
for n-tuple vectors. It is essentially the same; nothing of importance is gained. 


H A.B.M. 


212 GEOMETRIES [8 


The equality holds only if OP, and OP, are in the same direction. 
From this comes the Euclidean property: a line is the shortest 
distance between two points. 

Further, the vectors OP, and OP, have a unique difference (in the 
additive group of vectors): the vector OP,-OP,=P,P,=OP’, 
where P’ completes the parallelogram in Fig. 8.46. Hence the unique 
OP’ is the parallel through O to the given vector P,P,. The result: 
a unique parallel to a given line can be drawn through an outside 
point. This is another of Euclid’s postulates. 

Going back to general vector space, we can throw in a definition of 
distance which is different from that chosen for Euclidean space and, 
at the same time, we can define lines so that they are still the shortest 
distance between two points (vectors). However, the property about 
parallels through a given point (Euclid’s postulate) will no longer be 
necessarily true with a new definition of distance. Two variants of the 
property can arise: 

(i) There is no parallel to a given line through an outside point. 

(ii) There are several parallels to a given line through an outside 

point. 
The spaces so obtained are non-Euclidean. In case (i), they 
are called elliptic, and often named after Riemann (1826-66) who 
investigated their properties. In case (ii), they are called hyperbolic, 
and often named jointly after the two mathematicians Bolyai 
(1802-60) and Lobachevsky (1793—1856)* who first established in 
detail the construction of non-Euclidean geometries. 

Two illustrative examples are given here, without attempting to 
go into the details of the specification and properties of the spaces. 
Both examples, however, are of interest in that they can have 
practical applications. 

The first example is that of spherical geometry, the geometry of 
two-dimensional points and lines on the surface of a sphere (in 
Euclidean space of three dimensions). It is thus possible to translate 
from non-Euclidean space of two dimensions into Euclidean three- 
dimensional terms. In spherical geometry, lines are the shortest 
distances between points on the surface of a sphere, i.e. they are 
great circles of the sphere. Three lines intersecting in a spherical 
triangle are shown in Fig. 8.5a. For two lines to intersect in a unique 

* This is the Lobachevsky rendered into song by Tom Lehrer. 


δ] GEOMETRIES 213 


point (and three lines in a unique triangle), 
we must adopt the convention that a point 
and its antipodal counterpart (on the other 
side of the sphere) are the same. One and 
only one line can be drawn through two 
distinct points; but for a unique distance 
between the points, we need the convention 
that the shorter arc (of the great circle) is 
taken. Then every pair of lines intersects in a 
point without exception. There are no par- 
allel lines. If we draw a circle on the sphere 
parallel to a great circle like AA’ (e.g. by drawing ‘tram lines’ round 
the equator), the second circle is not a great circle, it is not the shortest 
distance between points on it and, in fact, it is not a line. Hence we 
have an elliptic non-Euclidean geometry, of type (i). One feature of 
triangles (spherical triangles) is that the sum of the angles is greater 
than 180°. For example, one line can be the equator (4.4’), a second 
can be a polar great circle (BB’), and the third can be another polar 
great circle (CC’ shifted to intersect BB’ at the poles, B and B’). The 
triangle so formed has two right-angled angles (on the equator 4.47) 
and a third (non-zero) angle at a pole. 

The second example is that of the two-dimensional geometry of the 
inside of a circle. The circle is in Euclidean space of two dimensions; 
this space extends outside the circle whereas the non-Euclidean 
space is confined within it. Again a translation into Euclidean terms 
is possible. A line in Euclidean space, as the shortest distance between 
points, is the path of a light ray with a constant velocity in a homo- 
geneous medium. Assume that the inside of the circle is a non- 
homogeneous medium such that the velocity of light varies from 
point to point. Then a line inside the circle, 
as the path travelled by a light ray, appears 
to be curved to the Euclidean outsider. For 
example, Fig. 8.55 illustrates the case where 
the velocity of light is proportional to the 
distance of the point from the circumference 
of the circle; lines are arcs of (Euclidean) 
circles cutting the boundary at right angles. 
Fig. 8.55 Suppose BC is a given line and A a point not 


2]4 GEOMETRIES [8 


on BC; BB’ and CC’ are lines through Band C respectively, which meet 
at A, making a triangle ABC. Another line through A (such as DD’) 
in the shaded angle BAC cuts BC. But a line through A (such as 
EE’) in the other angle does not cut BC, i.e. it is parallel to BC, and 
there are many such parallels. We have a hyperbolic non-Euclidean 
geometry, of type (ii). In this geometry, the sum of the angles of a 
triangle is less than 180°. The triangle A BC shown has zero angles at 
B and C and a third angle of approximately 90°. This is the non- 
Kuclidean geometry of Poincaré (1852-1912); it has relevance to 
certain theories of the actual universe of the astronomer and 
physicist. 


8.6. Co-ordinate geometry. In 8.5 we have taken a line as the shortest 
distance between two points. We now abandon this idea. Instead, in 
returning to Euclidean space, we exhibit a line as a special case of a 
‘locus’, the geometric equivalent of the algebraic concept of a 
‘relation’. 

From sets X and Y, each consisting of all real numbers, form the 
Cartesian product X .Y={(z,y)|xeX,ye VY} as the set of all 
ordered pairs (x, y) of real numbers. A relation is a proper subset of 
X . Y, specified by some statement yz, and denoted by 


{(x, y) |xeX, ye Y, y Rx}, 


or more shortly by yRzx. A locus is a set of points (x, y) in the plane 
Oxy, the points for which some statement yz holds. If an actual 
plotting of the locus is made on a physical plane, the result is a 
graph; it can be regarded either as the graph of the relation or as the 
graph of the locus. Hence, given a relation yRx, the geometric 
representation is a locus and an actual plotting is a graph. Co- 
ordinate geometry is the study of properties of loci by means of an 
algebraic treatment of corresponding relations. 

The relation/locus concept is a very wide one; it is just any subset 
of real number pairs (x, y) or of points in a plane. As shown in 7.1, it 
may be specified by a single equation or inequality, e.g. 2?+y?=1 
is the locus of points on a circle, and x? +y?<1 is the locus of points 
within a circle. It is quite possible, indeed common, to specify a 
relation/locus by two or more equations or inequalities. The locus is 
then the intersection or union of two or more subsets, one for each 


6] GEOMETRIES 215 


of the equations/inequalities. In such cases, we often speak of the 
locus as the ‘solution set’ obtained by the intersection (or union) of 
the separate sets (see 6.8 above). Consider three examples: 
(i) a2 +y?=4 and x>0. 

The locus of this relation consists of all points on a semi-circle, centre 
at O and radius 2, as shown in Fig. 8.6a. It is the intersection of the 
set of points on the circle (x?+y?=4) and the set of points in the 
half-plane to the right of Oy (z>0). 


2 


h. 


(( 
( 


1<x74y7<4 


SS 


W% 
Wl 2 3 4 
7 

0«χεῖοΥ 0 «γ6. 


Χ 


(ii) la? +y?2<4. 
The locus is the set of points in the ring-like space between two 
circles (both with centre at O, of radius 1 and 2 respectively). It is 
the intersection of the set of points within one circle (x? + y?< 4) and 
the set of points outside the other circle (v?+y?>1). 

(iii) O0<v<1 or O<y< il. 
This is a relation which specifies ‘or’ rather than ‘and’. The locus 
corresponds to the union (rather than the intersection) of two sets. 
One set is the vertical ‘band’ of points between Oy and the line parallel 
to Oy and distance 1 from it (0. .- 1), shown with one hatching in 
the graph (Fig. 8.6a). The other set is a similar horizontal ‘band’ of 
points shown with another hatching in the graph (0<y<1). The 


216 GEOMETRIES [8 


relation specifies that one or other (or both) of the statements 
(0<%<1, 0<y<1) holds. The locus is the union of the two sets, i.e. 
all points hatched in the diagram. Compare: 0<x<1 and 0<y<l, 
a relation and locus which are the intersection of the two sets, as 
cross-hatched in the graph. 

A function y =f (x) is a particular kind of relation, that for which a 
unique y corresponds to any given x in its domain. The locus is then 
a set of points in Oxy with the property that vertical lines (parallel 
to Oy) cut it in no more than a single point. None of the loci of the 
above examples is of this kind; the relations are not functions. If a 
function is given aivebraically, then we can write y oxpHoluly in 
terms of x in some way, as illustrated: 

(iv) 22+y?=4 and y<0. 

Here y= -- (4 -- “3) is the explicit function. The locus is a semi- 
circle below Ox (centre at O and radius 2) as graphed in Fig. 8.60. 


4 Ι ] 
' ῃ 
3 rt y=least integer 
2 1 § πο less than x 
1 (x>0) 
i 8 
3 4 


Fia. 8.65 


(v) y=J(e2) = 1. 
This is in explicit form and it can also be written y=| x | - 1, where 
| z | is the absolute value of x. The locus consists of points on two 
half-lines meeting at (0, — 1), as graphed. 

(v1) y=1 when 0<2<1; y=2 when l<ax<2; y=3 when 
2<4<3; 
The explicit statement of the function is: 


y =least integer not less thana (5.50). 


6] GEOMETRIES 217 


It is a step-function. The locus is a set of points which ascend in steps, 
as graphed; at each step, the point of the locus is on the lower rung 
(fH 1, 25-8, uc) 

The simplest general relation is the linear relation specified by 
ax +by +e=0, where a, ὃ and ὁ are real parameters, given in the form 
of two ratios a : ὃ :c, and such that a and ὃ are not both zero. If 
a0, the two parameters are δία and c/a. If 640, they can be 
written a/b and c/b. The corresponding locus is a line: 


Dermnirion: A (straight) line is the locus of points (x, y) in the plane 
Oxy satisfying the linear relation ax +by+c=0, where a, ὃ and c are 
given real numbers (a and b not both zero). 


Distinguish first a special case, and then the more general case: 


Case b =0. Hence a0 and ax +c=0 or 2 =( —c/a)i.e. x is defined for 
only one real number (-—c/a) and y is then any real number. The 
locus is a line parallel to Oy, as graphed in Fig. 8.6c in the particular 
case 7=1. 

Case 640. Hence ax+by+c=0 can be written 
y =(-a/b)x + (—c/b) and y isa function of x defined 
on the set of all real numbers. The locus is a line 
not parallel to Oy, as graphed for x+y -—1=0, or 
34 ΞΞ1 --α, in Fig. 8.6c. 

The main properties of a line can be summarised : 

THEOREM: The line ax+by+c=0 contains the 
point 

P (AX + μᾶς, Ay + HY2) 
uf it contains the points P,(x1, y,) and P.(%2, Ys), 
for any real numbers ἃ and p such that X+p=1. 
Conversely, if a locus contains the point P whenever 
it contains P, and P,, then the locus 1s a line. 
The proof is simple algebra. For the direct result: 


at,+by,+c=0 and az,+by,+c=0 
since P, and P, are on the line. Hence: 
A (ax, +by,+c)+p(ax,+by,+c)=0 for any A and uw 
1.6. a (Ay + μα) +b (Ay, + μψ9) τ ο(λ᾿-Ἔ μ)ΞΞ 0 
1.6. (Ax, -Ὁ pt.) +b(Ay, +uy.) +c=0 sinceA+p=l1. 
Hence, FP is on the line, as required. For the converse: 


218 GEOMETRIES [8 
Fix any two points P, (x,, y,) and P,(x,, y,) on the locus (whatever 
it is). Then P(x, y) is a point on the locus where 
e=t,+px, and y=Ay,+py, (A+p=l1). 
Let A and yu be any real numbers (subject only to A+y=1) and de- 
termine a relation between x and y by eliminating λ and p: 


Yot =ALYot pLYo and XyYy=ATY, + μας 


ie. | Yo — Loy =A(XyY_—Xey,) by difference. 
Similarly: YyX — LY = — μίσεψε — Ley). 
Subtract and put A+p=1: 

(Yo — Yr)X — (ὡς — Hy)Y =(LyYoq — χα.) «οονννννννννννννον (1) 
Since (1) is a linear relation, the locus is a line. Q.E.D. 


_ The result can be expressed in terms of vectors and their sums and 
scalar products. Given two vectors P,(2,, y,) and P,(2,, y.), write 
the vectors which are scalar products by λ and μ respectively: 
AP,: (Ax, Ay;) and pP,: (μᾶς, μϑ9). 

Add these vectors to give the vector P: 

P=AP,+pP, Le. P: (λα; + px, Ay, +pYy2)- 
The theorem then states: the line through P, and P, contains 
P=A)P,+ yP, if and only if A+=1. This is, indeed, an alternative 
definition of a line (see 8.9 Ex. 13), i.e. a definition in geometric rather 
than algebraic terms. 

A line through O(0, 0) is of particular interest. If the relation is 
ax + by +c=0, x=0 and y=0 must satisfy it, 1.6. c=0. A line through 
O is the locus ax + by =0 for any given ratio a : ὃ. By the theorem in 
vector form, if P, (z,, y,) is a point on the line (in addition to O), then 
P=AP, + (1 —A)O=AP, is on the line for any A. Hence a line through 
O is described by scalar multiples AP, of a given vector P,. Scalar 
multiples differ only in length, not in direction. 

The scalars A and μ determine which points are ‘between’ P, and 
P, on the line through P, and P,, and which are ‘outside’ P, and Ρ,. 
Write 1 =1—), so that 

P=AP,+(1-A)P, (any A) 
gives a point P on the line P,P,. The cases are: 
A=0: PatP,; A=1: Pat P,; 
0<A<1: P between P, and P.; 
Otherwise: PP not between P, and P,. 


6] GEOMETRIES 219 


The property of between-ness corresponds to the ordering of the real 
numbers. If P, is (x,, y,) and .Ῥ, (ας, y,), then P is 

λα, + (1 —A)x@g, Ay, + (1 —A)y3}. 
Hence, in the case of Fig. 8.6d, M is between VM, and M, on Ox if 
2,<At,+(1-A)u,<4, ie. if O0<A<1. Similarly, N is between N, and 
N, on Oy if 0<A<1. It is in this sense that P is between P, and P, 
if 0<A<1. 

Case b =0, line parallel to Oy, is easily 
disposed of. If P,(x,, y,) and P, (ας, y2) 
are two points on the line, then x, =2,. 
The general case, 640, has x,;4x, for 
any two points P, and P, on the line. 
This is now assumed. 

The slope of the line ax+by+c=0, 
or y=(-a/b)x+(-c/b), is defined to 
be the ratio (—a/b). Ρι (αι, y,) and 
P,(%., Y,) are any two points on the 
line so that: 
ax,+by,+c=0 and az,+by,+c=0. 
Subtract: a(x.—2,)+b(y,—-y,)=0 


Fic. 8.6d 


i.e. slope = — a/b = (Ys — ψι) (ας -- Wy) eee .οννννννννννννννον (2) 


Further, if the length P,P, is p and if the direction of the line is given 
by the angle α it makes with Oz, then: 

ρ-Ξ ψί(αι -- ἀφ)" + (ψι -- ψ2)} and tan «=slope=(y,—y,)/(%_-2;). 
Here tan α is a trigonometric ratio, equal to sin «/cos a. Fig. 8.6d 
illustrates and see 8.9 Ex. 3. 

To determine the equation of a particular line, we need to be given 
either two points on the line, or one point on the line and the slope. 
From (1) and (2), it follows that the equations are: 


Two points P, (αν, 9) and P,(s, 2): y-y.= ag em) - 
“4 ἘΠ (ν 
One point P,(x,, y,) and slopem: y-y,=m ve — 4) 


Each of (3) is a linear relation ax +by+c=0 with particular forms 
for the parameters. 
A non-linear locus is to be specified by the non-linear relation y Ra 


H2 . A.B.M. 


220 GEOMETRIES [8 
which defines it. Some of the loci are ‘curves’, such as circles, parabolas, 
ellipses and hyperbolas, as examined in plane geometry. To illustrate 
simply : 

(a —a)?+(y—6)?=r? (a,b, r given real numbers, r>0) 


is a relation with three parameters a, ὃ and r. The corresponding 
locus is defined to be a circle. The familiar geometric property of a 
circle then follows at once. If P is any point (x, y) on the locus, and 
if Q is the given point (a, ὃ), then: 

PQ=J/{(x —a)*+(y—6)*}=r>0_ by the relation. 


Hence P is a given distance r from the point Q. A circle is the locus 
of a point which is a given distance r from a given point (a, δ). The 
parameters a, ὃ and r are at choice; the centre of the circle is (a, δ) 
and the radius r. If a=b=0 and r=2, then x? + y?=4 is a circle centre 
O and radius 2. 


8.7. Projective geometry. Distances and angles, defined in Euclidean 
space, are invariant under a particular group of transformations, 1.e. 
translations, rotations and reflections. These transformations are 
rigid motions not affecting the shape of any figure or of any locus as 
a set of points. In particular, the distance ./{(x, — 2)? (ψι — Y2)*} 
between two points P,(x,, y,) and P,(%., ψ4) is invariant (8.9 Ex. 23). 
Other kinds of transformations do not leave distances or angles 
invariant; they distort a ‘figure’. Here we consider briefly one such 
group of transformations (projections) to illustrate the fact that 
invariant properties depend on the particular type of transformation 
applied. 

In projective geometry, we view the properties of points in two 
dimensions from a vantage point in three dimensions. The idea is 
rather similar to that of using complex numbers (and an Argand 
diagram in two dimensions) to get properties of real numbers (points 
on a line in one dimension). We have extra freedom in dealing with 
real numbers; for example, the perfect square x? + y? is exhibited as 
the product of conjugate complex numbers (x +iy) (2 —iy), and the 
negative (—2) is obtained from x by two operations of multiplication 
by ὁ (rotation through 90°). 

Consider points such as P and figures such as the triangle A BC on 
a plane Π. In three-dimensional space, select another plane IT’ and 


7] GEOMETRIES 221 


some point Q not on either Π or Π' as in Fig. 8.7a. Join Q to the point 
P on Π and let the line QP cut Π’ in the point P’. Then P on JT is 
said to have image P’ on II’ by projection from the given centre Q. 
This is a transformation of points P on [J into corresponding images 
P’ on II’. Applying the trans- 3 
formation to the triangle ABC on 
IT, we get a triangle A’B’C’ on II’. 
Projection from Q sends points into 
points and lines into lines. But the 
shape of a figure on {7 is altered 
when projected into its image on 
IT’. Distances and angles are not 
invariant, so that (for example) 
the triangle ABC and its image 
A’B'C’ can be of quite different Fic. 8.7a 

shapes. 

Given the centre @ but a whole set of planes JT, IT’, IT’’, ..., we 
obtain a set of transformations (projections) from points on any one 
plane to images on another plane. The product of two transforma- 
tions can be defined as successive projections, from points on II to 
points on IJ’ and then from points on JT’ to points on J7’’. It is easily 
shown (8.9 Ex. 25) that the transformations form a group under the 
operation of multiplication. The question is: for a projection of the 
group, what properties of figures remain invariant? The most striking 
is a property, the ‘cross-ratio’, of four points A, B, C and D ona line: 


DEFINITION: The cross-ratio of four collinear points 

AB.CD _AB/AD 

AD.CB ~~ CB/ CD’ 

A CB Ὁ Here we can regard A and C as reference points 
in terms of which we can assess the position of two 


(ABCD) = 


aati other points B and D. One ratio AB/CB fixes B 
ABC Din relation to A and C, where AB and CB are 
(ABCD) <0 lengths with an appropriate sign attached. In 
Fic. 8.76 Fig. 8.7b, positive distances are to the right, nega- 


tive to the left. So AB and CB are both positive 
if B is to the right of A and C(AB/CB>0) but AB is positive and 
CB negative if B is between A and C(AB/CB<0). Similarly, the 


222 GEOMETRIES [8 


other ratio AD/CD fixes D in relation to A and C. The cross-ratio is 
obtained by dividing one ratio AB/CB by the other AD/CD; it can 
be positive or negative. Two cases where (A BCD)>0 and (A BCD)<0 
respectively are illustrated; other cases can be drawn with the four 
points in various positions. It is easily checked that (4 BCD)>0 
means that both or neither of B and D lie between A and C, 
(ABCD)<0 that just one of B and D lies between A and C. The 
particular case (A BCD) = —1 has B and D dividing AC in the same 
ratio, one internally and the other externally (8.9 Ex. 28). 
From a given centre Q, project collinear points A BCD on IT into 
collinear points A’ B’C’D’ on IT’. Alleight points 
Q and @Q lie on one plane, cutting J// in the line A 
| and JI’ in the line λ΄. Fig. 8.70 is drawn in this 
plane. The invariant property of projective 
geometry Is: 
THEOREM: The cross-ratio of four points is 
invariant under projection : 
λ (ABCD) =(A’B'C'D’). 
To prove, use formulae for the area of a triangle 
(Appendix A.8): 


Al B’ Ὁ! D’ 
Fig. 8.76 


5 QA .QBsin 2 AQB=Area QAB= 5 AB 


where h is the length of the perpendicular from @ to the line A. So: 


AB = ; QA .QBsin -AQB and similarly for other lengths. 


1 QA .QBsin LAQBx > QC .QD sin ΟΡ 


Hence: (A BCD) = 1 


h 
sin ZAQB.sin 00} 
~ gin .AQD .sin -CQB 
in se OR ea sin 2 A’QB’ . sin Οὐ Ὁ)’ 
Similarly: (A BC D ) = sin ZA’‘QD’ . sin LC’'QB’ 
_ sin 2408. sin LCQD | 
~ gin LAQD , sin 2COQB — 


QA .QD sin 2 AQD x ; QC .QB sin .CQB 


(ABCD) QE.D. 


7] GEOMETRIES 223 


This is a very remarkable result. In a projection from JJ to I’, the 
ratio 4 B/CB gets changed to A’ B’/C’B’. At the same time the ratio 
AD|CD is changed to A’D'/C’D’. The invariance of the cross-ratio 
simply implies that, no matter what planes are used in the projection, 
the ratio AB/CB changes proportionately with the ratio AD/CD. 
For example, if one ratio is doubled, so is the other. 

Projective geometry has to do with the art of perspective drawing. 
Moreover projective methods can be used to establish geometric 
properties (not depending on distances and angles), properties which 
are proved only with great difficulty by traditional methods. Consider, 
as a simple example, the ‘com- 
plete quadrilateral’ ABCDEF 
shown in Fig. 8.7d. The diagonals 
AC, BD and EF meet in three 
points P, Q and R. The result is: 
P and Q divide the diagonal AC 
internally and externally in the 
same ratio, and similarly for the 
other two diagonals. 

Proof: by projection from the 
line BD to the line HF, with 
centre at A, the cross-ratios (BPD R) and (HQ FR) are equal. Similarly, 
by projection from BD to AC, with centre at H, the cross-ratios 
(BPDR) and (A PCQ) are equal. Hence: 


(A PCQ) =(BPDR) =(EQFR) =) (say). 


It remains to find A. By projection from BD to AC, with centre at FP, 
the cross-ratios (BPDR) and (CPAQ) are equal. Hence 


(CP AQ) =\=(APCQ). 


AP .CQCP.AQ _ 1 
AQ.CPCQ.AP — 

Hence A= + 1 and, by the grouping of the points in (A PCQ), A= —1. 
The cross-ratio (A PCQ) = — 1 on the diagonal AC. The case of cross- 
ratio equal to —1 has already been noted (8.9 Ex. 28 again); it 
implies that P and Q divides AC internally and externally in the same 
ratio. Similarly, (BPDR)=-—1 and (EQFR)=-1 with the same 
result for the other two diagonals. Q.E.D. 


Fig. 8.7d 


So: M2 =(APCQ) x (CPAQ)= 


224 GEOMETRIES [8 


8.8. Homogeneous co-ordinates. Consider the lack of symmetry 
between points and lines noted in 8.2. We say that any two points 
are joined by a unique line and we would like to say that any two 
lines meet in a unique point. We are held up by the difficulty about 
parallel lines. Again, consider the relation az + by +c=0. If the ratios 
a:b:c are given (which makes two parameters), the relation shows 
a variable point (x, y) on a fixed line. If the values of x and y are 
given (again two parameters), the relation shows a variable line (as 
a, ὃ and ὁ take different values) always passing through a fixed point 
(x, y). The relation has a dual interpretation, according as ὦ : ὦ :c 
or z and y are given, i.e. it is a variable point on a fixed line or a 
variable line through a fixed point. The lack of symmetry arises 
since the co-ordinates of the fixed point are two numbers x and y 
whereas the ‘co-ordinates’ of the fixed line are two ratios of numbers 
a:6:c,. Everything would be symmetric if the co-ordinates of a 
fixed point were (x, y, 2), the ratios x: y :z being given. The idea 
here is to represent a point on a plane by three co-ordinates rather 
than two, 1.6. ‘homogeneous co- 
ordinates’ (x, y, z) considered as two 
ratios x : y : z. This is in the interests 
of symmetry, of conceptual neatness. 
Moreover, though it may seem con- 
fusing, it is not without practical use. 

P is a point (x, y) referred to axes 
Oxy in a plane IT. In Fig. 8.8a, P is 
located by OM =x and ON =y. Draw 
OZ at right angles to Π and take Q 
on it, unit distance below O. Draw 
QX and ΟΥ̓ parallel to Oz and Oy 
respectively. Then, referred to axes 
QX YZ, any point R in three dimensions has co-ordinates (X, Y, Z). 
In particular, O is (0, 0, 1) and P (a, y, 1). 

Consider any point P’ on the line QR, with co-ordinates 
(AX, AY, AZ) for various real values 4. As special cases, A=0 gives 
Q, A=1 gives Rk and A=1/Z gives P, provided that P is on QR: 
x=X/Z, y= Y/Z. Hence any point on QF can be assigned the same 
co-ordinates (X, Y, Z) as long as only the ratios X : Y : Z are used, 
1.6. treating (AX, AY, AZ) as (X, Y, Z). In particular, the point P 


y 


x 
Fig. 8.8a 


8] GEOMETRIES 225 


in IZ has the co-ordinates (X, Y, Z). This means that its actual 
co-ordinates in three dimensions are (X/Z, Y/Z, 1) referred to 
QX YZ, and that its actual co-ordinates in the two dimensions of IT 
are (x, y) referred to Oxy, where x=X/Z and y= Y/Z. With this 
convention, any projection from @ (e.g. of P into P’) leaves the 
homogeneous co-ordinates (X, Y, Z) of a point unaltered. The actual 
co-ordinates in three dimensions are changed, (AX, AY, AZ) for 
various A, but this matters only if we are concerned with distances 
and angles. 

In a plane Oxy, a point is (x,y) and a line is ax+by+c=0. To 
convert to homogeneous co-ordinates, write x=X/Z, y=Y/Z, 
denote the point by (X, Y, Z) and the line by aX +bY¥+cZ=0. 
There is now complete symmetry: a point (X, Y, Z) given by ratios 
X:¥Y:Z and a line (a, ὃ, c) given by ratios a : ὃ : c. Moreover, in 
this form, nothing is changed by projection from the point @ from 
one plane to another. | 

We are now in a position to handle 
parallel lines. Consider a plane Π' 
cutting ΟΣ YZ in A’, B’ and C” res- 
pectively (Fig. 8.8.b). Project from IT’ 
onto IT with centre Q so that P’ goes 
into P, (X, Y, Z) being the homo- 
geneous co-ordinates of each. Then 
C’ goes into O, both with co-ordinates 
(0, 0, Z). The line C’A’ goes into Oz, 
both represented by Y =0. The line 
C’ B’ goes into Oy, or X =0. What of 
A’, B’ and the line A’B’? A’ has Fia. 8.80 
co-ordinates (X, 0, 0) and so has its 
image in JT. Since QA’ is parallel to JJ, there is no (finite) point of 
intersection. But we have a ‘point’ specified by co-ordinates (X, 0, 0). 
By convention, label this point in [J as the ‘point at infinity’ on Oz, 
co-ordinates (X, 0, 0). Similarly, B’ goes into the ‘point at infinity’ 
on Oy, (0, Y, 0). Finally, the line A’ B’ is given by Z=0 and this goes 
into the ‘line at infinity’ on IT, equation Z=0. 

Homogeneous co-ordinates (X, Y, Z) for points in a plane II 
therefore work as follows. If Z40, we have a (finite) point in the 
plane, with homogeneous co-ordinates (X, Y, Z) or (X/Z, Y/Z, 1). 


226 GEOMETRIES [8 


The actual co-ordinates referred to Oxy are x = X/Z, y= Y/Z. If Z=0, 
the point (X, Y, 0) is a point at infinity, e.g. (X, 0, 0) at infinity on 
Ox and (0, Y, 0) at infinity on Oy. In terms of lines, aX +bY +cZ=0 
is any line in the plane J7 in homogeneous co-ordinates. As special 
cases, X=0 is recognised as the line Ox and Y =0 as the line Oy. 
Further, Z=0 is now to be interpreted as the line at infinity. In 
short, to get a point at infinity, on the line at infinity, simply write 
4=0. The actual co-ordinates (X/Z and Y/Z) of a point referred to 
Oxy have no meaning for points at infinity; but the homogeneous 
co-ordinates can be written. 

To pursue the symmetry between points and lines, consider the 
dual cases: 

(i) Given two points P,(X,, Y,, Z,) and P,(X,, Y,, Z,), find the 
line joining them, i.e. find α : ὃ : 6 so that aX+bY+cZ=0 goes 
through P, and P,: 


aX,+bY,+cZ,=0 and aX,+bY,+cZ,=0 


b Cc 


eR a as a a = 
giving: ὙΖ, Υ͂Ζ, ΦΧ, - ΠΧ, ΧΥ͂ -ΧΥ͂, γι τ. Ὁ: (1) 


The denominators in (1) are all zero only if X, : Y, : Z, are the same 
ratios as X,: Y,: Z,, i.e. only if P, and P, are the same point. If 
P, and P, are distinct, then there are always ratios a : ὃ : c for the 
line P,P2, given by (1). This is true of points at infinity, as well as 
finite points. Write Z,=Z,=0(P, and P, points at infinity) and (1) 
gives a=b=0, i.e. the line P,P, is Z=0. The line joining any two 
points at infinity is the line at infinity. 
(ii) Given two lines 


A, (qX+6,¥ +¢,Z=0) and dA, (c,.X +b,¥+¢,Z=0), 
find their point of intersection, 1.6. find X :Y:Z so that both 
equations hold: 
aX+6,¥+c,Z=0 and a,X+B,Y +c¢,4Z=0 


ee x Ν Y _ Z (2) 
giving: δια, — Daly = δίας — Coll = Gib, — eb, δον ale Ca Weed ws νονων ρων 
which is the dual of (1). Again, if the lines \, and A, are distinct 
(a, : 6, :c, different from a, : b, : 69), then there are always ratios 


X ΟΥ̓́: Z for the point of intersection, given by (2). This is true of 


8] GEOMETRIES 227 
(Ἴ. ὃς 
b, ὃ, 
(A, and A, parallel) and (2) gives Z=0, i.e. a point at infinity. The 
point of intersection of parallel lines is a point at infinity. The dual 
is complete; points at infinity are ‘where parallel lines meet’. 

Further tidying up is now possible. Two lines meet in one point 
since there is a single (unique) solution of two (distinct) linear 
equations. What of loci given by relations of second or higher degree? 
The number of points of intersection should clearly be dictated by 
the number of roots of the corresponding equations. Let us explore 
this question for the circle, one of the curves with a second-degree 
equation. Given two second-degree equations in x and y, the elimina- 
tion of y generally produces an equation in x of the fourth degree 
with four roots, i.e. four real roots, or two real roots and a conjugate 
complex pair, or two conjugate complex pairs. Hence two circles 
should meet in four points: four real points, or two real points and 
two ‘imaginary’ points, or four ‘imaginary’ points. This doesn’t seem 
to fit with facts, for no pair of circles appears to give four points of 
intersection ; two points of intersection (real or ‘imaginary’) are all 
we can locate.* What of the missing pair? They can be located by 
use of homogeneous co-ordinates. 

_ The equation of a circle with given centre and radius (8. 6) i 15 


ΟΠ (σπα)δε -- δ)" -ν 
or (X -- αΖ)35- (Υ̓ —bZ)Pa=r72Z? occ cece ee ees (3) 
in homogeneous co-ordinates. The line at infinity Z —0 intersects the 


circle in two points (X, Y, 0) where X : Y is given by (3) on | putting 
Z=0: 


parallel lines as well as lines which are not parallel. Write 


X44 Y*=0 or Y= 41k. 
Since only the ratio matters, write X =1, Y= +7. Hence the circle 
(3) cuts the line at infinity in the two points (1, 7, 0) and (1, —7, 0). 
Any line cuts a circle in two points, real or ‘imaginary’ (8.9 Ex. 32). 
We have found the two points where the circle cuts the line at infinity ; 
they are ‘imaginary’ as well as on the line at infinity. What is so 
remarkable, however, is that the points (1, +7, 0) do not depend on 
a, ὃ and r, the parameters in (3) which locate the centre and radius. 


᾿ * Compare the case of two ellipses, also with second-degree equations, where four 
points of intersection (real or imaginary) can be located. See 8.9 Ex. 34. . 


228 GEOMETRIES [8 


All circles go through the same pair of points (1, +1, 0), called the 
circular points at infinity. The answer to our question is now clear 
(and further illustrated in 8.9 Ex. 33). Two circles do intersect in 
four points, 1.6. the two (imaginary) circular points at infinity, 
together with two other points, real or imaginary. 


8.9. Exercises 


1. Rigid motions. Show that a horizontal translation of a figure in a plane 
(as in Fig. 6.3c) can be achieved by two reflections, by turning the figure over 
a vertical line and over again. 

2. Show that the result of Ex. 1 holds for a rotation about a fixed point O, 
by turning the figure over and over again about lines through O. Generalise 
for combinations of translations and rotations and hence for all rigid motions. 
Distinguish between the result of an even number and an odd number of 
reflections. 7 

3. Gradients. In a triangle OAB, take OA horizontal and OB rising (e.g. a road 
on a hill). Interpret the statement that ‘the gradient of OB is 1 in 12’ and indicate 
that there may be some uncertainty. Suggest astrict 
definition of gradient as the slope of a line (8.6). 

4. Demonstrate the validity of the equation of 
PRto PQ —-PS =PQ +( -- PS) in Fig. 8.9a by inter- 
preting (—PS) as PS’. 

5. Vectors as an additive group. From the proper- 
ties (ii) of 8.3, check that the set of all vectors is a 

Fia. 8.9a commutative group under addition, with identity 
the zero vector. 

6. Ina vector space V (over Ff), show that 0u =0 for all vectors u. Proceed: 
write u+0u=lu+0u=(1+0)u by rules S2 and S4. Use the facts that 0 in ἢ 
is such that 1+0=1 and 0 in V is such that w+0=u, and the cancellation 
rule for the additive group of vectors. 

7. By a method similar to Ex. 6, show that a0 =0 for all scalars a, where 0 
is the zero vector. Compare with a0 =0 for a field 
(0 the zero scalar) as in 6.9 Ex. 20 

8. The additive group of the vector space V has 
both wu and its negative —w. Proceed: u+(-u)= 
0 =0u ={1 +(-1)}u, and reduce by use of S82 and 
S4. Hence establish that (- lhu= -u. 

9. Euclidean space: sum as resultant of vectors. 
The sum of P,(2,, ψι) and P,(x2, y2) is defined as O M, M, M 
(x, ας, ψι ἘΜ). Let P complete the parallelogram Fie. 8.96 
as in Fig. 8.96. Show M,M =a,, NP =y, and hence 
OM =2,+2,, MP =y, Ἐν... Deduce that P is the sum of P, and P,, OP being 
the resultant of OP, and OP,. Extend to Euclidean space of 3 dimensions. 


9] GEOMETRIES 229 


10. From the definition of the sum of (x,, y,) and (22, Y2) as (ὦ; +2—, Yi +Y9) 
show that the negative of (x, y) in the additive group of vectors is (-—2, —y) 
and that the difference: (ας, y,) less (1, y1) is (ας —%1, Y2 —Y1). Show the point 
P’ (%,-2%1, ψε -- ψι) graphically relative to P, (21, ψι) and P, (a, y,) and that 
OP, is the resultant of OP, and OP’. 

11. Develop the geometric concept of a vector PQ as the relative location of 
two points P and Q (with no fixed position) by taking an origin O in the plane 
and showing that the vector PQ is equivalent to OR, i.e. to the point R once O 
is fixed (Fig. 8.9c). Further, if axes Oxy are selected to give co-ordinates 
P (21, νι) and Q(x, ¥2), then the vector PQ is equivalent 
to the point R (x,-2,, y,—y;,). Hence show there is Q 
no inconsistency in taking an algebraic vector (zx, y) P—— ἢ 
either as ὃ geometric point (x, y) or as a geometric : rd ae 
vector PQ, where x and y are the differences of the co- τ i 
ordinates of P and Q. i R 

12. Follow up Ex. 11 by showing that the distance υ 
between two vectors (e.g. PQ of Fig. 8.96) is the same SaaS 
concept as the length of one vector (e.g. OR of Fig. 8.9c). 

13. Lines defined by scalar products. P(2,, y,) and Q(x y,) are two fixed 
points referred to axes Oxy. If S(x, y) divides PQ in the ratio A : 1 —-A where 
0 <A <1, show that x =(1 -A)x, + Ax, and y =(1 —A)y, + Ay. Check for A=0, 4, 
1. Use R of Ex. 11 and take T on OR in ratio λ : 1 —A. Show that T has co- 
ordinates A(z, -- 53) and A(y, —y;), 1.6. Τ' is the scalar product of R for variable 
scalars A. Hence show that the line PQ can be defined by varying S correspond- 
ing to varying Τ᾽, i.e. by varying scalar products. Deduce its equation: 


v2, a9 ~ 91 
Ve-Xy Yn -Ys 


with common value A. See (3) of 8.6. Is the restriction 0 <\ <1 necessary? 

14, Mapping the vectors (-x, y). Show that the mapping of the vectors 
(-2, y) for x and y positive, isa reflection in Oy of the mapping of the vectors 
(2, y) (Fig. 8.9d). Establish that the map- 
ping is completed once the point A’ for 
the vector (—1, 0) is inserted and that 
the vector OA’ is the negative of OA, i.e. 
the product by the scalar - 1. 

15. In Fig. 8.9d, take α΄ =180° --α. 
Extend the relations 


x=pcosa® and y=psina 


O 
(—x,0) (-1,0) (1,0) (x,0) for P(x, y), where x>0 and y>0, so 
that they apply to P’( -- ὦ, y). Show that 
Fie. 8.9d cos «’ must be written as OM’/OP’ and 


sin α΄ as M’P’/OP’ and that this is equiv- 
alent to defining cos (180° — «) = — cos « and sin (180° — α) =sin « (Appendix A.7). 


230 GEOMETRIES [8 


16. Complete the extension of Ex. 14 and 15 to accommodate vectors 
(x,y) for all real x and y and trigonometric ratios of angles « where 
0° <a <360°. 

17. A line has slope m and intercept c on Oy. From (8) of 8.6, noting that 
the line goes through (0, c), show that it has equation y =ma +c. Conversely, 
show that the locus y =mz +c is a line of slope m and intercept c on Oy. 

-. 18. Show that, if (x,, y,) and (x2, y,) are points on a line parallel to Ox, then 
Y1=Y2. Deduce that the line has equation by +c =0, or y =constant. Compare 
with the line “ =constant parallel to Oy. Check from formula (3) of 8.6. 

19. The line parallel to Oz has slope zero; what can be said of the slope of 
the line parallel to Oy? 

20. Show that a +y=1 is the line through (1, 0) and (0, 1). Generalise and 
establish that bx + ay =ab is the line with intercepts a and 6 on the axes. 

* 21. The parabola. S is a given point, 
distant 2« from a given line d. A point P 
moves so that the distance SP is always 
equal to the perpendicular distance PN 
from d. The locus P is described as a 
parabola with focus S. Fix axes so that S 
is (0, «) and d is y= ~—« (Fig. 8.96) and 
show that the equation of the locus is 


y= a (Write expressionsfor SP and PN 
= ' 


Fia. 8.86 


in terms of OM =x, MP =y, and equate.) 
* 22, Conversely, show that y=ax* (4>0) is a parabola with focus at 


S (0, ~) . Show also that y =ax? (a <0) is a parabola, the reflection of the locus 


of Fig. 8.9e in Ox. This latter parabola can represent the path of a ball thrown 
into the air (neglecting air resistance). 

23. The group of rigid motions. A transformation sends P(x, y) into 
P’ (x’, y’) by reflection in Ox; show that x’ = and y’ = -- y and that the distance 
between two points is unchanged by the transformation. Use Ex. 2 to 
deduce that the same is true of all transformations of the group of rigid 
motions. | 

24. Use (2) of 8.4 to show that the angle between two lines (vectors) is 
invariant under a transformation of the group of rigid motions. 

25. The group of projections. A figure is projected (from a fixed point Q) from 
a plane JT to a plane 17. Check that the set of all such projections, for various 
Π and II’, forms a group, the identity being the projection from Π to IT and 
the inverse of the projection from Π to IT’ being that from IT’ to II. | 

26. Points A, B, C and D are on a line λ at distances a, b, c and d respectively 
(ὁ —a)(d -c) 

(d —a)(b -c) 

27. Keep A fixed and permute B, C and D in (ABCD). If one of the six 

cross-ratios, (ABCD), (ABDC), ..., has value » show that the others are 1/,, 


fore a fixed point on λ. Show that the cross-ratio (ABCD) = 


9] GEOMETRIES | 231 


b(d —c) 
d(b —c) 


1 --μ, i = 1 - and ι[1-- ~—. Check by writing (ABCD) = (Ex. 26, 


case a = a and by permuting , c and d. 
28. Harmonic division. Show that (ABCD) = — 1 is equivalent to 


AB/BC =AD/CD, 


i.e. B and D divide the segment AC internally and externally in the same ratio. 
— -* 29. Pencils of lines. A set of lines {a, ὃ, c, ...} passing through a fixed point 
P form a pencil (Fig. 8.9f). If (ab) denotes the angle between lines a and ὃ, 
sin (ab) sin (cd) 
sin (ad) sin (cb) 
it equals (ABCD) where A, B, C and D are the intercepts of the pencil and a 
line A not through P. Deduce that this cross-ratio is also invariant under 
projections. Comment on the duality displayed between points and lines. 


define the cross-ratio of a pencil of four lines as and show that 


O (t,0) 5:50 
Fia. 8.8σ 


* 30. In projecting (from a fixed point Q) from line A to line λ', choose axes 
as shown in Fig. 8.99. The point A’ (x, y) on λ' corresponds to a point A (ἐ, 0) 
on A. By solving the equations for the line QA and that for λ' (y=mz +c), 
show that x =(b —c)t/(b+mt). Deduce that the transformation of such a pro- 
jection can always be written algebraically as « =at/(B + yt) for parameters «, 
B and γ. 

31. In homogeneous co-ordinates, see that the line joming O (0, 0, 1) toa 
fixed point P(X,, Y,, Z,) is aX +bY =0 where a/b = — Y,/X,. Similarly show 
that the line joining (1, 0, 0) and (X,, Y,, Z,) isb Y +cZ =0 where b/c = -- Z,/Y,. 
Interpret as a line parallel to Oz. 

32. A circle and a line are given; write their equations: x?+y?=r? and 
ax +by+c=0 (the origin being at the centre of the circle). Show that there 
are two points of intersection. For what values of a : ὃ : c are they imaginary? 

33. Illustrate that two circles cut in only two points (real, coincident or 
imaginary) by finding the intersections of x? + y?=1 with each of 


(c-l)*t+y=1, (@-2)?+y?=1 and (x -3)?4+y?=1. 


Tilustrate graphically. Put into homogeneous co-ordinates and find in each 
case the circular points at infinity (1, +7, 0) as intersections, 


232 GEOMETRIES [8 


*34. The ellipse. An ellipse with fixed axes (taken as Ox and Oy) is the locus 
with equation 2z?/a? + y?/b?=1 for parameters a and ὃ. Solve 3z?+y?=1 and 
x?+3y?=1 and illustrate that two such ellipses can intersect in four real 
points, here (+4, +4). Show that nothing is added by putting the equations 
into homogeneous co-ordinates. 

*35. Conic sections. A circle C is given in the plane Π. Project from a point 
@ outside [7 onto another plane H’. Show that C is sent into a circle, ellipse, 
parabola or hyperbola according to the position of IZ’. These are ‘conic sec- 
tions’, i.e. plane sections of the cone with vertex Q and base C. 

*36. Affine geometry. As a special case of projective geometry, project from a 
plane Π to a plane IZ’ by parallel lines. If IT and IZ’ are not parallel planes, 
show that a circle in Π is sent into an ellipse in IT’. This is ‘affine geometry’ and 
the corresponding transformations are those of the ‘affine group’ (7.9, Ex. 24). 


CHAPTER 9 


LIMITS AND CONTINUITY 


9.1. Functions of a real variable. Before embarking on an exploration 
of new territory, that of mathematical analysis or the calculus, we 
can profitably pause to take stock of our equipment. Analysis can 
be regarded as a very specialised and highly elaborate extension of 
algebra, developed from two particular concepts: the real number 
system and the functional relation. These are somewhat incidental 
in algebra but they form the essential foundation on which the 
powerful techniques of the calculus rest. 

In algebra we are concerned with sets of elements which may or 
may not be numbers. Even when we deal with numbers, we are often 
quite happy to stick to rational numbers, making up an ordered field 
with the long list of properties of 2.2. It is true that purely algebraic 
considerations require an extension of the number system to real and 
complex numbers; without them we have no zeros of polynomials as 
simple as x? -- 2 or x? + 2. But the exploitation of real numbers is the 
job we undertake in analysis rather than in algebra. The property to 
exploit is that real numbers form a complete ordered field (2.4 and 
6.7): 

If a set of real numbers has a lower bound, then it has a GLB; if a 

set has an upper bound, then it has a LUB. 

This is the feature which distinguishes the real numbers from the 
integers and rationals. A set of numbers x such that z?< 2 (or equally 
22<2)* is bounded in both directions; it has no LUB or GLB if x 
is confined to the rationals whereas it has LUB=./2 and GLB= — //2 
if x is a real number. 

In algebra, also, we are interested in relations between two sets 


* There is no difference between the two sets if x is rational. The difference when x 
is real is that the set such that z?< 2 does not include its LUB ,/2 whereas the other 
set does (and similarly for the GLB= —./2). 


234 LIMITS AND CONTINUITY [9 


and, rather incidentally, with the particular kind of relation called a 
function (7.3). In analysis, however, the particular case is the one 
pursued. Analysis deals with sets of real numbers and with functional 
relations between them; it is the study of real-valued functions of a 
real variable. It is important to be quite explicit on this limitation. 
A relation R is any subset of the Cartesian product X . Y of two sets 
X and Y of any elements, i.e. a set of ordered pairs 


{(, yy) |eeX, ye Y, y Καὶ, 


where yRzx is a statement linking x and y. The relation is a function 
when the statement is such that, if x ε X has a corresponding ye Y, 
then y is unique. Let X be limited to the domain of the function, 1.e. 
the set of first elements z of the pairs (x, y) which correspond to some 
y; let Y be limited to the range of the function, i.e. the set of second 
elements y of the pairs (x, y) which correspond to some x. Then a 
function is a many-one mapping f of the set X onto the set Y. The 
rule which specifies which y corresponds to a given x 1s a statement 
yfc, usually written y=f(x). There is, as yet, no limitation on the 
kind of elements comprised in the sets X and Y. Now suppose that 
X and Y are sets of real numbers, where x Ε X and ye Y are called 
variables, as opposed to single real numbers which are constants. To 
distinguish the variables of a function, x in the domain X of the 
function is called the independent variable and y in the range Y of the 
function is the dependent variable. We have then a real-valued function 
of a real variable, a many-one mapping of one set X of real numbers 
(the domain of the independent variable x) onto another set Y of real 
numbers (the range of the dependent variable y). 

There are always two things to specify for a function of a real 
variable: 

(i) the sets of real numbers involved in the many-one mapping 

XY, ive. the domain X and the range Y of the function, and 
(ii) the rule y =f (x) of the mapping, i.e. how to get from a given x 
of the domain to the unique image y in the range. 

Both are apparent in the full notation for a function as the set 
{(a, y)|eeX, ye Y,y=f(x)}. Both are stressed in the shorter 
notation ‘f: X->Y’, as sometimes used for a function having the rule 
f in the mapping of X onto Y. 

In practice, we adopt a looser terminology. We have in mind the 


1] LIMITS AND CONTINUITY 235 


rule y=f(x) of a function and (perhaps to a less extent) the domain 
X on which it is defined. The range Y is to be found by taking all 
possible z’s in the domain X and seeing what y’s correspond. Hence 
we speak of ‘the function f whose values are y=/f(x) defined on the 
domain X’. Quite usually, we make no explicit reference to the 
domain X, speaking simply of ‘the function y =f (x)’. This is indeed 
an elliptic expression since we are transferring the term ‘function’ 
from the mapping of one set onto another (which is what it is) to the 
particular rule y=f(x) which specifies how y is obtained from x in 
the mapping. This is all very well, and a great saving of time, 
provided we remember what is implied, provided we leave no doubt 
about the domain of the function and about the consequent range of 
its values.* | 

In practice, also, the domain of a function is usually described 
loosely. Again there is no difficulty, provided we remember that the 
domain is a set of real numbers. So, if a function has domain 
{x | x a real number}, we may say that the function is defined ‘for all 
α΄. Or, if the domain is the set of real numbers between 0 and 1 
inclusive, i.e. the set {x | z a real number, 0<x<1}, we may speak 
of the function as defined ‘on 0<a<l’. 

For example, consider the linear function y = 25 —1. Here we may 
mean that the set X of all real numbers is mapped onto the set Y of 
all real numbers by the linear rule y=2x-—1, and it may be safe 
enough to leave the domain X understood. Or, we may have some 
other domain X in mind, e.g. the set X={x|a a real number, 
0<a< I}, in which case we may amplify: the linear function y = 25 -- Ἰ 
defined on 0<v<l. Again, write the function y=./x. Here we must 
be more careful. The rule y= /x, of the mapping X->Y which is the 
function, is to be interpreted: given a real number 2, then y is the 
value obtained as the positive square root of x. Hence, x must be 
zero or positive and y also zero or positive. The domain X of the 
function can not be the set of all real numbers. It can be the set of all 
non-negative real numbers or any subset such as all real z>1. We 


* Note the definition given by Dirichlet (1805-59): f(a) is a real function of a real 
variable if, to every real number x, there corresponds a real number f(x). Here the 
second ‘f(x)’ is the rule for getting from z in X to y =f(x) in Y. Something needs to be 
specified about what set X of real numbers is the domain and hence about what 
set Y of real numbers is the range. The first ‘f(x)’ is then to .be interpreted as 
ff: XY. 


236 LIMITS AND CONTINUITY [9 


must say what we have in mind, e.g. the function y=./x defined on 
2>0 (or on x>1, or whatever the domain may be). 

In this chapter, the main development is in terms of a general 
function, and of its general properties, without specifying any 
particular rule. We need a general and flexible notation for the rule 
of the function, one which is capable of distinguishing many different 
functions. Ringing the changes on small and capital letters, and on the 
English and Greek alphabets, we can write: 


f(x), g(x), ... F(x), G(x), ... d(x), p(x), .... 


A notation such as y=f(x) applies to the rule of the function; it 
needs to be completed by specification of the domain X. A different 
function arises if the rule is changed or if the domain is varied (or 
both). So y=f(x) defined on X is a different function from y=g (x) 
defined on X ; different letters f and g indicate that the rule is changed. 
Equally, y=f(x) defined on X, is a different function from y=f(zx) 
defined on X, since they have different domains X, and X,; the 
same letter f is used when the rule for getting from x to y is not 
varied. For example, y=? defined for all x, y=2? defined for x>0, 
y =./x defined for x>0 are three different functions. The rule is the 
same for the first two so that y=f(x) can be used for both, with 
f (x) =a? here. The rule is different for the last two so that, if y=f(z) 
is used for one, then a different letter (say g) is needed in writing 
y =g (x) for the other. 

On the other hand, a change in the labels of the variables is purely 
formal, having no effect either on the function or (in particular) on 
the rule of the function. So y=f(x), x=f(y), z=f(u), ... all represent 
the same functional rule and, if the domains (X, Y, U, ...) are the 
same sets of real numbers, they are all the same function. In the first 
we write x for the independent and y for the dependent variable, and 
we simply change these labels for the others. For example, y=<?, 
a=y?, z=u*, ... defined on the domain (X, Y, U, ...) of all real 
numbers are all the same function. 


9.2. Algebraic and other functions. In using particular functions for 
illustration, we draw upon rules obtained by algebraic operations. 
The following are some examples: 


2] : LIMITS AND CONTINUITY 237 


Rule y =f (x) 7 Domain X Rule y =f (x) Domain X 
| 2 
(1) y=lt+xr+2? all x (i) y= 7% all x(x +1) 
x + 1 
(iii) yo τὶ >I (lV) y= -l<r<l 
x—l V1 - 


The importance of the specification of the domain is here further 
illustrated. In each case, the domain X shown is the widest set 
possible for y to be defined at all. Any subset of X can serve equally 
well as a domain. The range Y of each function follows from a con- 
sideration (sometimes quite involved) of the values y can take; as 
shown below, Y is all real y># for (i). 

These are all cases of algebraic functions, i.e. the rules involve 
polynomials or root extraction (surds) and their ratios. They are 
called ‘algebraic’ since the rules are derived from x and given con- 
stants solely by repeated use of the operations of elementary algebra, 
1.6. the four rational operations (+, —, x, +) together with the 
extraction of nth roots (n a positive integer).* The variety of 
elementary functions is extended very considerably later on (Chapter 
12). Note, here, that a function of a real variable can be defined 
perfectly well by a rule which is not algebraic. Each of the following: 

(v) y=least integer not less than x 
(vi) y=1 when = integral and y= —1 when x non-integral 

(vil) y=1 when z rational and y= — 1 when z irrational 
satisfies all the conditions for a function of a real variable x, being a 
many-one mapping of the set X of all real numbers onto a set Y of 
real numbers. Y is the set of integers in the first case and the set 
{—1, 1} in the others. 

The geometric representation of functions, the basis of co-ordinate 
geometry (8.6), depends on the one-one correspondence between the 
ordered set X of real numbers and the set of points making up a 
directed line in space. This is an isomorphism preserving order; the 
complete ordering of the real numbers is matched by a complete 
ordering of points on a directed line. One representation (as in 7.3) 


* Root extraction: ὃ =2/a implies a=b", Itis thei inverse of the process of multiply - 
ing a number by itself. ͵ 


238 LIMITS AND CONTINUITY [9 


of a function ἢ: XY is a mapping from points on one line to points 
on another line. More usually, the mapping of the function y=f(z) 
proceeds by associating each pair (x, y) of the function with a point 
P (x, y) referred to axes Oxy in a plane (as in 8.6). The result is that a 
function corresponds to a locus cut by lines parallel to Oy in no more 
than one point. An actual plotting on paper gives a graph of the 
function or locus. A graph of the function and locus (i) (y=1 +a +2?) 
is shown in Fig. 9.2, from which it is clear that the range of y is the 
set of all real y>?. In this case, the 
locus is recognisable as a ‘curve’ (a 
parabola). In other cases this is not 
so. An example is the graph of the 
function (vi). Fig. 9.2 shows that the 
locus here is the set of points on 
the line y = — 1, except that a (count- 
ably infinite) number of points is 
missing, being replaced by points on 
the line y = 1. 


9.3. The algebra of functions. A 

number of preliminary definitions 

y=1 x integral and distinctions need to be made. 

=-1 xnon-integral The function y=f(x) is defined on 

ΕἸα. 9.2 the domain X. Often X is the set 

of all real numbers {5 [ὦ a real 

number} or some ‘half’ set such as that of all positive real numbers 

{2 | aa real number, z>0}. Such domains correspond to the whole 

directed line Ox or to some ‘half-line’ of Ox. On other occasions it is 

either necessary, or at least useful, to limit X to an ‘interval’ of real 

numbers between specified values a and b, corresponding to a segment 

of the directed line Ox. In addition, however we define X, we often 

need to consider a ‘neighbourhood’ of real numbers around a par- 
ticular value x =a within the domain X. Formally: 


DEFINITION: An interval [a, b] is the set {x |x a real number, 
ax<ax<b} for given real numbers a and ὃ (a<b). A neighbourhood NV 
of the real number « is an interval [a, δ] containing «within tt (a<«<6). 


An interval, often written shortly ας “ες, is a closed set of numbers 


9] LIMITS AND CONTINUITY 239 


containing both end values a and 6.* A neighbourhood is defined — 
broadly, as any interval containing « within it. Later, the idea is to 
concentrate on ‘small’ neighbourhoods or on neighbourhoods which 
get ‘smaller’; this is something which needs to be developed carefully 
and in connection with the concept of a limit. Further, an interval 
or neighbourhood of values of x is specified so that corresponding 
values of y=f(x) can be examined. Various questions arise. If 
y =f (x) is defined on [a, δ], are the values of y bounded or not? If 
bounded, do the values of y themselves constitute an interval? Does 
a neighbourhood N of x =« correspond to a neighbourhood of values 
of y around f(«)? These are by no means trivial questions; they need 
to be examined very closely. 

Given various functions f(x), g(x), ..., we can write other functions 
by combining them by the operations of elementary algebra. Simple 
combinations are: 


F(x) Ἐφ); f(x) —9 (x); f(@) x g(x); f(@)/g(x) for g(x) 40. 

In such ways, complicated functions can be split into simpler ones, 
or given functions can be used to define new ones. It seems obvious 
enough but we must say exactly what we mean about the domains 
on which the combination functions are defined. Consider the sum 
function and suppose that y, =f (x) is defined on X,, y,=g(x) on Xo. 
Both functions are defined on the set of x’s common to X, and Xo, 
1.6. on X=X,NX,. If x e X, both y, and y, are uniquely defined and 
80 15 ¥=Y, + Ya. Hence we have the function f(a) + g(x) defined on X, 
the intersection of the domains of the separate functions. The domain 
is the same for the other combinations, except that the ratio f (x)/g (x) 
needs a qualification: it is defined on the domain X = X,NX, except 
that any « for which g(«)=0 is excluded. 

A more sophisticated, and very useful, combination of two given 
functions is the composite function or function of a function: 


Given y= F (u) and u=f (x), then y= F{f(z)}. 


* Variants can also be defined, open at one end or the other. So a<x<b can be 
denoted as [a, δ, a<x<b as Ja, ὃ], a<x<b as Ja, δ. The conventional symbols ‘oo’ 
and ‘- οο᾽ (read plus or minus ‘infinity’) appear in connection with limits below. By 
using them, it is possible to describe complete or ‘half’ sets of real numbers as inter- 
vals, e.g. [a, 00] for x>a, [- οὐ, ὃ] for x <b, and [ — οὐ, «] for all x. The minor conven- 
ience of such conventional notations is probably out-weighed by the dangers of using 
them ; there is the temptation to write the interval [a, ὃ] and then to put ὃ -- οὐ (which 
is without meaning). 


240 LIMITS AND CONTINUITY [9 


Here it is a matter of matching the range of u=f(x) with the domain 
of y = F (u) before we can specify a domain for y = F'{f (x)} as a function 
of x. To illustrate, consider the following cases where the domain X, 
of u=f(x) is ‘all x’: 


u=f(2) Range U, y=F(u) Domain Ὁ, y=F{f(x)} Domain X 
(i) w=l+z all u y=u* all u y=(1+2)? all z 
iii) w=1l+e2+23 u>? y=1/u all u(u+ 0) y=1/1+2+2? all z 
(iii) w=1—2? u<l y=l]Vu u>0 y=1fV1i—-a -1l<¢<l 
| Gv) w=1-VIi+a? u<o y=1/ Vu u>0 y=1//1—-Vi +e nos 


Let u=f (x) be defined on domain X, and have range U,; let y= F (u) 
be defined on domain U,. In many cases, the range U, is either the 
same or included within the domain U,, as illustrated by (i) and (ii). 
The domain X, of f(z) then goes through to serve as the domain X 
of the composite function, 1.6. y=F{f(x)} is defined for the same 
values of x as f(x). In other cases, the range U, and the domain U, 
are overlapping sets, as illustrated by (iii). The composite function 
is then defined only for those values of wu which belong to both U, 
and U, (for we U,NU,). The domain X, (i.e. x giving ue U,) must 
be restricted to X (i.e. x giving ue U,NU,) before it can serve as the 
domain of the composite function. So y= F{f(x)} is defined for a 
smaller set of values of x than f(x). In (iii), we start with the set of 
all x (giving w<1) but we must restrict to -1<a< 1 to ensure that 
u>0 (as well as w<1) and that 1/,/u is defined. Finally, as (iv) 
illustrates, it is quite possible that the range U, and domain U, do 
not overlap at all, so that the composite function is defined nowhere. 

The function y =f (x) is an ‘increasing’ one if, whenever we increase 
x from a to ὃ, we also increase f(x) from f(a) to f(b). A similar property 
is required of a ‘decreasing’ function. Both are very special cases of 
functions.* 


DEFINITION: y =f (x) 18 an increasing function on the domain X if 

a<b implies f(a)<f(b) for all a and ὃ in X; it 1s a decreasing function 
if a<b implies f(a)>f(b) for all a and ὃ in X. 
In Fig. 9.3, cases (i) and (ii) illustrate increasing functions; cases 
(iii) and (iv) are functions neither increasing nor decreasing. The 
functions are here defined on the interval [«, 8B], or on some subset of 
the interval. 


* A function which is either an increasing or a decreasing function on X is often 
described as a monotonic function, i.e. monotone increasing or monotone decreasing. 


9] LIMITS AND CONTINUITY 241 


As a final distinction, we can enquire when a given function y =f (x) 
also provides an inverse function x=g(y). The given function is a 
many-one mapping f: X >Y. Looking at the mapping the other way 
round, we expect in general that, to a specified y in Y, there corre- 
spond many 2’s in X. Hence, in general, x is not a function of y at all. 


O 


Bx 

Fia. 9.3 
The particular case where z 18 a function of y arises when the mapping 
f: XY is one-one. The inverse function x =g(y) then exists and it 
can be denoted x=f-1(y), 1.6. f: XY and f-1: YoX are the same 
mapping. 

THEOREM: A relation between sets X and Y gives both a function 
y =f (x) and an inverse function x=f-1(y) uf and only if the mapping 
X<+Y 18 one-one. 

The notation, as so often in writing functions, tends to be somewhat 
confusing here. The function f and its inverse f-! must not be taken 
1 
f 
1 
f 
fusion may arise from the practice of switching the labels of variables. 
Since x=f(y) is the same function as y=f(x), with variables inter- 
changed, the same applies to x=f-1(y) and y=f-1(x), We can write 
y =f (x) and y =f-1(z) as two different functions, one of which happens 


as reciprocals: f-1 is not —; this is clear when we try to write the 


inverse of y=f(x) as x=—(y), a meaningless notation. Another con- 


242 LIMITS AND CONTINUITY [9 


to be the inverse of the other. Then y =f (x) implies x -Ξ [1 (y), sticking 
to the same labels for the variables. Equally, y=f-1(x) implies 
a=f(y), again with the variables having unchanged labels. The 
labels are switched in passing from one statement to the other. 

For example, take X and Y both as the set of non-negative real 
numbers. Then y =a? and x=./y are alternative ways of writing the 
game one-one mapping between X and Y. Hence, y=2* and ἤξερα 
are different functions, but one is the inverse of the other. Keeping 
to the same labels for the variables, we say that y =a? implies «= /y. 
Equally, we say that y= /x implies x=y’. 

As another case (9.9 Ex. 10), consider the relation 2?+y’=1 
shown by a circle of unit radius, centred at O. Hence y= + J/(1 -- 2?) 
for values —1<a”<1; this is two-valued and not a function. If we 
take only the positive root, then y =./(1 — 2?) isa function defined on 
—1<ax<1, 1.6. the semi-circle above Ox. On inversion, z= + /(1 -- ψἢ) 
which is two-valued and not a function. Hence y=./(1—2?) as a 
function defined on —1<xz<1 does not have an inverse. Now take 
y =,/(1—2*) defined on 0<v<l, ie. a function represented by the 
quarter-circle in the positive quadrant. Then «= (1 -- y?) defined on 
0<y<. This is also a function and it is the inverse of y=/(1 -- 2?) 
defined on 0<2<1. It happens that y=./(1 — 2?) is its own inverse 
when both variables are restricted to the interval [0, 1]. 

There is a connection between functions which are increasing 
(decreasing) and functions which possess inverses. Fig. 9.3 illustrates. 
An increasing function implies a one-one mapping; it has an inverse, 
which is also an increasing function. A similar result holds for a 
decreasing function. The converse is not true; a one-one mapping 
does not imply that the function is increasing or decreasing. Cases 
(i), (ii) and (iv) of the diagram are all one-one mappings, i.e. functions 
with inverses. Cases (i) and (ii) are increasing functions; case (iv) is 
not increasing or decreasing. Hence, an inverse function can always 
be written for an increasing (or decreasing) function; but inverses 
exist in other cases. 


9.4. Limits of sequences. The exploitation of the real number system 
is effected by introducing and applying the concept of a ‘limit’. 
The concept itself is implicit in the definition of real numbers and 
we can get, easily enough, a general idea of what it implies. 


4] LIMITS AND CONTINUITY 243 


When we write /2=1:4142... or 7=3-14159 ..., we mean that 
the real number can be approximated by rationals, the more 
closely the greater number of decimal places we take. The real 
number itself is, in some sense, the ‘limiting value’ as the number of 
decimal places is increased without end. The difficulty (faced in 2.4) 
is to get a precise formulation, as a foundation to carry the weight of 
the super-structure of properties contructed on it. The same difficulty 
appears in designing a precise and general definition of a ‘limit’. It is 
essential to achieve precision here since the whole of the calculus 
rests on the foundation of limit processes. 

The properties of the number system come into play in various 
ways. Suppose a function y=f(x) is defined for all real x. The set of 
all real numbers is indefinitely extended in the sense that there are 
always x’s larger than any specified value, and it is indefinitely dense 
in the sense that there are always x’s between any two specified 
values. If you think of a number, no matter how large, I can always 
produce a larger one; if you think of two numbers, no matter how 
close together, I can always produce one between them. We are thus 
led to ask the questions: what happens to y =f (2) as larger and larger 
real numbers are assigned to x, and what happens when real numbers 
are assigned closer and closer to some given real number «? The 
answer to the first question turns on the concept of the limit of f(x) 
as x increases without bound, written Lim f(z). The other question 

TD 


is answered by defining the limit of f(x) as x approaches a, written 
Lim f(x). In pursuing these matters, we find at some stage that we 
ua 


require the complete ordered property of real numbers. Suppose f(x) - 
is bounded over some interval of x. Then, because f(x) has an upper 
_ bound, it must have a LUB; because f(x) has a lower bound, it must 
have a GLB. It is because the LUB and GLB exist that we can pin 
down the limit. | 

It is convenient to start with a particular case, the limit of a 
sequence. The basic idea here is that a sequence such as 1-4, 1-41, 
1-414, 1-4142, ... has a limit ,/2. The convenience lies in the fact that 
a sequence is easy to handle, with a simple graphical representation. 
But the case is important enough in itself, with applications to be 
pursued in Chapter 11. 

Consider the function f(x) defined on some domain of real numbers 


I A.B.M. 


244 LIMITS AND CONTINUITY [9 


x which includes all positive integers 1, 2, 3, ... . Write the sequence 
of real values: 


F(1), f(2)s 7.8), «+ 70), - 


Elementary algebra provides many examples, e.g. sequences (or 
series) of terms and the corresponding sequences of sums of terms. 
A well-known case is that of the G.P. (geometric progression): 
1, r, r?, 73, ... 7*-1, ... for any real r. The sum of n terms is: 


1 -- γη 
ΤΥ Ἐκ εν ἘΞ 


-Ἶ 
which is itself a sequence: 
L-r? 1-,8 1-γ5 
Ὁ | ὅτ 


To illustrate the possibilities fairly fully, consider five particular 
cases of sequences, shown graphically in Fig. 9.4 by plotting f(n) 
against n=1, 2, 3, ...: 


F(n) 


"}-» 


4] LIMITS AND CONTINUITY 245 


(i) f(n) =)". 
The first five terms are plotted: 
1,4, 2,4 te, «αι 
These are the terms of a G.P. with r=. 
Z Byn_] 
Gi) fm) --ἰἴ δ ὁ =2¢4 1 


or the sum of » terms of a G.P. with r=. Again the first five terms 
are plotted: 


1. 5 48 85 211 
92> 49 8s 169 e22° 


ἐν: 1 
(ii) f(m) == 
i.e. the sum of 1 terms of a G.P. with r=3: 


1 2 37 175 781 
9 4: 169 6864» 2569." 


ΟΡ, 4α- 4)» 


(iv) f(n) == 8" <4¢1 — (— 39) = 81 (8) (n odd) 


1+2 
and = 5 {1 — (2)"} (n even) 
which is also the sum of a G.P. with r= -- 3: 
1, x, +3, 23, #84, .... 


(v) f(n)=1 (n odd) and — (n even). 

Here the first eight terms are plotted: 1, $, 1, 4, 1, 4, 1,4,.... 

The question is: what happens to f(n) as m increases (through 
integral values) without bound. To save words, the notation ‘n->o’. 
is used to stand for ‘n increases without bound’; it means no more 
and no less than this. If we wish, we can read ‘n—>oo’ as ‘n tends to 
infinity’; but, if we do, we must remember that ‘oo’ of ‘infinity’ has 
no meaning by itself. Certainly, we must avoid any suggestion 
whatever that can be put equal to ‘oo’. 

There seems to be little difficulty with any of the five examples. 
The existence or otherwise of a limit is evident both from the form of 
f(n) and from the graph. A term like (3)" or (2)", and more generally 
r” for a given positive number r<1, gets smaller and approaches the 
limit zero. On the other hand, a term r” where r>1, e.g. (3)", gets 


larger and increases without bound. Equally, terms like or Z tend 
n 


246 LIMITS AND CONTINUITY [9 


to zero, terms in n or n? increase without bound. As n->oo, the 
conclusions appear: 


(i) f(n) =($)"-1-0 steadily, i.e. Lim (4$)"-1=0 


(ii) f (nm) =2{($)" — 1}->00, increasing without bound 
(iii) f(n) =4{1 — (2)"}-4 steadily, i.e. Lim 4{1 — (2)"}=4 


(iv) (γι) =F{1 -- (- 2)"}->4 through oscillations, 
1.6. Lim #{1 —(-3)"}=4 


(v) f(n) =1 (nm odd) and : (n even) oscillates with no limit. 


These are all confirmed and illustrated by the graphs. 

Two points need to be stressed. First, we have simplified matters 
by writing the sequence f(n) as n takes integral values 1, 2, 3,.... 
There is no corresponding simplicity in the values of f(n); these are 
real numbers, not integers. In the five examples, it happens that the 
values of f(n) are rationals, but irrationals can easily appear, e.g. if 
f(n) involves (,/2)" = 2?". We must expect all the difficulties associated 
with the real number system to arise in defining limits. Second, 
there are clearly several different cases to watch for; some are 
represented in the examples and there may well be others. For this 
reason alone, we must proceed with great care if we are not to over- 
look something. In any case, for such a basic concept, we must make 
the definition precise and the development systematic. 


9.5. The limit process. As is shown formally in 15.4, it is not easy to 
achieve precision ; we have to go down very far into the fundamental 
ideas of what a limit process is. The immediate object here is to try 
out these ideas in this particular, and simplified, case of the limit 
of a sequence. 

In considering limits for a function y =f (x), we have to specify two 
things. The first is a set of stages for handling the variation of x. This 
is easy enough in the present case of the sequence f(n) for 
n=1, 2,3, .... The stages are a countably infinite set, i.e. the sequence 
of integers 1, 2, 3, ..., or better still the sequence of sets: 


Stage I=set of integers p>1; Stage Il =set of integers p>2; ... 


δ] LIMITS AND CONTINUITY 247 


and generally: 
Stage N =set of integers p>n. 


The successive stages are marked off as the segments I, ΠῚ, III, ... 
along On in the diagram. The second specification is the limit process 
to which the values of y=f(x) are subject as x varies through its 
stages. In the present case, consider all the values y=/f(p) for stage 


F(n) f(n)=4{1-(9)"} 


Fira. 9.5 


N(p>n), making up a set Y of real numbers. If the values are not 
bounded, there is no limit process. If they are bounded, then (by the 
complete ordered property of real numbers) the set Y has a lower 
bound and so a GLB c,, and it has an upper bound and so a LUB d,,. 
All y=f(p) of Y are contained in the interval [c,, d,,], i.e. ¢,<y<d,, 
and this is the smallest such interval.* Denote it by F(N): the least 

* Note that the interval [c,,, d,,] comprises all real numbers y such that c, <y<d,. 


Amongst them are the particular real numbers y =f(p) for p2n. Generally there are 
many other real numbers in the interval. 


248 LIMITS AND CONTINUITY [9 


interval containing all f(p) for stage NV (p>n). Hence, to the sequence 
of stages I, I, III, ... N, ... there corresponds a sequence of intervals 
F(I), F(II), F (III), ... F(N), .... By the definition, each interval 
is contained in preceding intervals, i.e. the sequence F'(N) forms a 
shrinking nest of intervals. Fig. 9.5 illustrates two cases. Hence: 


DEFINITION: For f(n) as n increases without bound and for a sequence 
of stages N (integers p>n), a limit process exists if f(n) is bounded in 
each N. It is the nest of decreasing intervals F(N), where F(N) is the 
smallest interval containing all f(p) over stage N(p>n). 


Consider now the intersection of all intervals F'(N), i.e. the set 
composed of real numbers common to all F (NV). Let F(N)=[c,, dy] 
and let F(M)=[c,,,d,] for a later stage (m>n). Then F(M) is 
contained in F'(JV), i.e. c,,>c, and d,,<d,. Hence, as we advance 
through the stages, c, increases and, since it has an upper bound 
(e.g. any d,,), it has a LUB c; similarly, d,, decreases with a GLB d. 
Hence an interval [c, d] is defined, either one point (c=d) or a finite 
interval (c<d). Any point common to all F(N) must belong to 
[c, d]. For, if y is one such, then c,<y<d, all n. So, y is an upper 
bound of the set of c,’s, 1.6. y>>LUB c. Similarly, y<GLB d. Hence, 
c<y<d and y belongs to [c, 4]. This development, which uses the 
complete ordered property of real numbers, establishes the important 
and powerful result: | 


THEOREM: If the limit process Κ΄ (Νὴ over stages N exists, then the 
untersection of all intervals F'(N) is itself an interval F =[c, d], where 
c<d, called the final residue of the limit process. 


It is now clear that there are only three possibilities, three different 
things which can happen to f(n) as increases without bound: 


I No limit process exists; the values of f(n) are not bounded. 
II A limit process exists and the final residue F =[c, d] is a finite 
interval (c d); we say that the limit of f(7) does not exist. 
IIT A limit process exists and the final residue F is the single real 
number L (c=d=L); we say that f(n) has limit L. 
DEFINITION: As n increases without bound, f(n) is convergent to the 
limit L if the limit process F(N) over stages N exists and if the final 
residue F consists of a single real number L. Write: 


Lim f(n)=L or f(n)>L as n—-oo. 
r—>D 


δ] LIMITS AND CONTINUITY 249 


The possibilities are illustrated by the examples of 9.4. Example (ii) 
has f(p) = 2{(3)? — 1}, not bounded in any stage N (p>n), 1.6. case I. 
Here, f(n) increases without bound as n increases without bound. 
This can be written: 


2{($)" -—1}-—0o as n->00. 


This notation means no more and no less than the statement just 
made. Case IT is illustrated by example (v) where f(y) =1 (p odd) and 


tT (p) = (p even), ie. Σ΄ (Νὴ is the interval [0,1] at every stage 
N (p>n). Lim f(n) does not exist. Case IIT is illustrated by the other 


examples which give limits: 
(i) Lim (z)""=0; (ii) Lim 4{1 -- (2)"}=4; (iit) Lim 7 {1 -- (- 2)"} =7. 
NO {ἀπ N—>0o 


It is, of course, case III which is the one of main interest. 

As far as the limit of a sequence f(n) is concerned, we are now 
through. From the strict definition put forward, we know precisely 
what a limit means. In particular we have isolated the three possi- 
bilities: I f(z) not bounded; II f(x) varying within a finite interval 
without converging; III f(n) converging to a single real number L. 
We have a complete set of categories; we know we are overlooking 
nothing. In practice, we need not take long to reach a conclusion. 
Given a function f(n), we first see whether or not f(n) is bounded. 
If f(m) is not bounded, we write f(m)—0o as n->0o as a matter of 
notation.* If f(n) is bounded, we check whether or not the intervals 
containing f(p) for p>n shrink down to a single number L as ἢ 
increases. If they do, we write f(n)->L as n->oo. We need, here, to 
keep an eye open for the ‘odd case out’ where f(p) oscillates in a finite 
interval for p>n, no matter how large n; this is the case of no limit. 

Various necessary and sufficient conditions for a limit can be 
deduced from the definition given here. One is often used: 


THEOREM: f(n) converges to the limit L as no wf and only if, for a 
given positive number ε (however small), there is a stage N(p>n) such 
that: 

L-ex<f(p)<Lie forall p>n. 


Directly: if f(n)—L as noo, there must come a stage NV when the 


* Or we can write f(n)— — οὐ as n—>00 if f(n) takes unbounded negative values. 


250 LIMITS AND CONTINUITY [9 


shrinking interval F (NV), comprising f(p) for Ὁ ΣΉ, is so small around 
L that it is contained in the given interval [ἢ -- ε, [+e], no matter 
how small ε is. Hence L-e<f(p)<L+e for p>n. Conversely: 
if m can be found so that L-e<f(p)<L+e for given ε (however 
small) and for p>n, then f(n) is bounded and a limit process F (Δ) 
exists. As the smallest interval containing f(p) for p>n, F(N) is 
contained in [Ὁ -- ε, £ +e]. By choosing ε small enough (and n large 
enough to match), [I -- ε, [+e] excludes any specified number + L, 
1.6. (Δ can be made to exclude any number +L. The final residue 
F can include only ZL, i.e. f(n)—>L as n->00. Q.E.D. 

This theorem gives conditions which are necessary and sufficient, 
i.e. which are equivalent to the definition of a limit given here. The 
conditions are, in fact, those often offered as the definition of a limit, 
a perfectly valid procedure. However, they are neither as clear con- 
ceptually, nor yet as practically useful, as the definition adopted 
above. It is no easy matter to put up various (small) values of ε and 
then to check for each whether an appropriate (and perhaps very 
large) integer n does exist. See 9.9, Ex. 21. 


9.6. Limits of functions. After this preliminary survey, we turn to 
the main question: given a function y=f(z) defined on a domain X 
of real numbers, how do we define the limit of y as x approaches a 
particular value «? It might be thought that there is no problem here, 
that the limit is just f («). This jumps to conclusions too quickly (see 
9.7 below). The next idea might be to write a sequence of 2’s con- 
verging on «, and use the concept of a limit of a sequence already 
developed. But this won’t do. The integers are a countable set, a 
sequence. The real numbers are not countable and a sequence of 
them does not cover all their properties. We must face this added 
complication, not avoid it. Specifically, the stages of any limit pro- 
cess must be re-defined to allow for variation of real numbers and 
not just for a sequence of integers. This is a problem which has 
bothered mathematicians since Newton and Leibniz developed the 
calculus in the seventeenth century. The approach adopted here is 
that of Moore and Smith (1922).* 

The real-valued function y=f(x) is considered for real values of 


* E. H. Moore and H. L. Smith: ‘A General Theory of Limits’, American Journal of 
Mathematics, Vol. 44, p. 102. 


6] LIMITS AND CONTINUITY 251 


« around a specified value « and with a view to defining Lim f(z).* 


Strictly we need assume no more than that the domain of zx is such 
that every neighbourhood of « contains some elements of the domain. 
For convenience, and with little loss of generality, we assume that 
f(z) is defined on an interval [a,b] of real numbers containing « 
within it, i.e. for a<a<b where a<a<b. We allow the exception 
that f(z) may not be defined at x=«. A wider domain such as all x 
or x2a would do equally well. A neighbourhood of « is denoted 
generally as N =[a,, b,] where a<a,<a<b,<b. The set S of all 
possible neighbourhoods W provides the stages for x approaching « 
through real numbers. S is essentially a non-countable set and any 
sequence of neighbourhoods (e.g. a contracting nest of intervals) is a 
very special subset. Among the neighbourhoods in S are some pairs, 
N, and N,, such that N, is contained in N,. Η N, is [ay;, 6,,] and 
Ν, [ng, Ong], then An; <Any <Ong<bno. If N, and N 2 are so related, we 
write V,>N, and say that N, is more advanced than N,. The relation 
‘>’ or ‘more advanced’ simply means ‘contained in’. It is a transitive 
relation, so that, if N,>N, and N,>WN,, then N,>N,. On the other 
hand, not all neighbourhoods of S are so related. Any two of them, 
N, and N,, must overlap (not disjoint) since they all contain α; but 
one need not be contained in the other. The important property of 
S is that, since NV, and N, overlap, there must be a third neighbour- 
hood NV; contained in both of them, as in Fig. 9.6a. Formally: 


Oa i N, δ * 
Fria. 9.6a 


If N, and N, are any two neighbourhoods of «, then there exists a 

third neighbourhood NV, so that N,>N, and N,>N,,. 
As stages, the neighbourhoods N are such that, whatever pair we 
pick, we can always find another which is more advanced than either, 
i.e. contained in both. 

A limit process can now be specified very much as before, using the 
complete ordered property of real numbers. If y =f (5) is not bounded 


* As a special case, take the limit of f(x) as x increases without bound (α-»οο). 
Here we can either extend the idea of the limit of f(n) as n—>00 or put x= 1/p in f(x) and 
let p->0. See 9.9 Ex. 23, 29 and 30. 


12 A.B.M. 


252 LIMITS AND CONTINUITY [9 


in a neighbourhood N, then there is no limit process. If it is bounded 
in every neighbourhood N=[a,,b,], then the bounded values 
assumed by y for x in N are confined in a smallest interval [c,, d,]. 
Hence, if a,<a<b,, then c,<y<d,. Write F(N)=[c,, d,] corre- 
sponding to N =[a,, b,]. From this definition, it follows at once that, 
if N, is more advanced than (contained in) N,, then the interval 
F (N,) is contained in the interval F'(N,). So: 


DeEFIniTion: For y=f (x) as x approaches « and for stages 
N= [@n; bn] 


as neighbourhoods of «, a limit process exists if f(x) is bounded in each 
N. It is the set of smallest intervals F (N)=[cn, dy] such that c,<y<d, 
when a,<x2<b,, and with the property: 


if N,>WN, then F(N,) is contained in F (N,). 


Fig. 9.6b illustrates for a simple function y=z(x+1) defined on 

x>0O and considered around x=2, 
y=x(x+1) y=6. Three neighbourhoods N,, NV, 
petivrerens © areas gy and N, are shown, with V,>N, and 
N;>WN,. The corresponding f'(N,), 
F(N,) and F(N;) have the property 
that F(N,) is contained in F(N,) and 
F (N,). 

The essential result, very simple and 
yet very powerful, is obtained as 
before, by use of the complete ordered 
property of real numbers: 


esse eet eB BE α» ὧν ἵν ὦ ὧν ὧν “- “" Ὅ» eae 


RAS ἂν τς “Ὁ Ὁ Ὁ ὦ» ἣν τῷ ὧν ὦ ὧν ὦ» τῶν 


tw 


an ne ar <r aoa oo 
aww an 


-“αν a πὲ 


N; N) τ 

ΤΉΞΟΒΕΜ: If the limit process F (N) 

over stages N exists, then the intersection 

of all intervals F(N) is itself an interval F =[c, d], where c<d, called 
the final residue of the limit process. 


Proof: any two neighbourhoods Ν, and NV, must overlap and contain 
a third, and any two intervals F'(N,) and F(N,) must overlap and 
contain a third. The lower end of one interval F'(N,) can be above 
the upper end of another F(N,) only if these intervals are disjoint, 
and this is ruled out. If F(N)=[c,, 4,1, then all c,’s (as N varies) are 
less than or equal to all d,’s. Since the whole set of c,’s is bounded 
above, there is a LUB c. Since the whole set of d,s is bounded below, 


Fia. 9.66 


6] LIMITS AND CONTINUITY 253 
there is a GLB d, and c<d. Hence an interval F =[c, d] is defined, 


either a single point (c=d) or a finite interval (c<d). It remains to 

show that all y common to all Κ΄ (Δ) are in F. Take any such y: 

Cn<y <d,, for all N =[a,, 6,]. Hence y is an upper bound of the set 

of all c,’s and so y>LUB c. Similarly, y<GLB d. Hence c<y<d 

and y belongs to F =(c, d]. Q.E.D. 
The definition of a limit then follows: 


DEFINITION: As x approaches «, f(x) is convergent to the limit L if 
the limit process F (N) over stages N exists and if the final residue F 
consists of a single real number L. Write: 


Inm f(x)=L or f(%)eL as α-»α. 


As a consequence of the definition, necessary and sufficient conditions 
for a limit can be laid down as follows: 


THEOREM: f(x) converges to the limit L as x—>« uf and only if, for a 
given positive number « (however small), there is a stage N =[dn, bn] 
such that: 


L-e<f(z)<L+e forall x of N (a,<x<b,). 


The proof is similar to that of the corresponding result of 9.5 above. 
Since these are necessary and sufficient conditions, i.e. equivalent to 
the definition of a limit, it is perfectly in order to use them alterna- 
tively as the definition. | 

There are just three possibilities to consider and to distinguish 
carefully : 


I No limit process exists for f(x) at =a. The values of f(x) are 
not bounded in neighbourhoods of z=«. 

II A limit process exists for f(x) at στε α and the final residue 
F =[c, 4] is a finite interval (c<d) so that Lim f(x) does not 
exist. δὰ | 

III A limit process exists for f(x) at x=« and the final residue F 
is the single real number L so that Lim f(x) exists and 


equals L. 
Several examples illustrate: 
(i) y=1/1 — 2 defined on all x (x1). No limit process exists at x =1 
since neighbourhoods (excluding x=1 itself where y is not defined) 
are such that y is not bounded. Here y increases without bound 


254 LIMITS AND CONTINUITY [9 


(numerically, through positive or negative values) as x approaches 1. 
We write: 1/1—z->+00 as x1, but this notation means no more 
than the statement given. 

(11) y=least integer not less than x defined on x>0. This is the 
step-function graphed in Fig. 9.6c (as in 8.6). A limit process exists 


5 (ii) 

I | 
Poy oe 
Ι ᾿ | 
a 


Ww > > ἂν ae σὉ. 


Fig. 9.6c 


at x=1. Write N =[a,, b,]) where 0<a,<1 and b,>1. Then WN gives 
the interval of y’s: F(N)=[1, 4] if 3<0d,<4; F(N)=[1, 3] if 
2<b,<3; F(N)=[1, 2]if 1<6,<2. Hence the final residue F =[1, 2]. 
There is no limit of y as x—1. 


— 2 
(ili) y = = defined on all x(v+1). If «#1, 
— (l-a)(l+2) | 
ὗν πὰ ὦ 


Ἰἔχτ-Ε], ψ is not defined. Hence the function is the same as the linear 
function 1 +x, except that the point z= 1 must be omitted. The graph 
(Fig. 9.66) is the line y=1+2 with a gap at x=1. A limit process 


6, 7] LIMITS AND CONTINUITY 255 


exists at x=1 and neighbourhoods N (excluding always x =1 itself) 
are such that the intervals /(N) always cover y=2 and contract 


— 72 
Ξ = 2. More generally: 


1 
onto this value. Hence Lim j 
al τὰς 


5 αἵ" — 5 
Lim 


goa ἀπ 


ἘΠΕ n—-l 


as can be seen by dividing α —z into αὖ -- “5. 

(iv) y=1 (x integral) and y= — 1 (x non-integral) defined on all z. 
The graph is reproduced in Fig. 9.6c exactly as in 9.2 above. Let 
N =[a,, b,] be a neighbourhood of 2 =1 (excluding 2=1 itself). Then 
the interval of y’s: F(N)=[-1, 1] if a,<0 or 6,>2; F(N) is the 
single number --Ἰ if 0<a,<1 and 1<6,<2. Hence the limit process 
exists with final residue # = —1, i.e. 


Lim y= --1 whereas y=1 at x=1. 
z—>1 


Contrast with the function: 
y=1 (x rational) and y= --Ἰὶ (2 irrational). 


The graph looks like two lines y= 1 and y= — 1 but, in fact, if there is 
a point on one line, there is no point on the other at a given x. Here, 
for all neighbourhoods N of x = 1, the interval of y’s is F (NV) =[ — 1, 1}. 
There is no limit. 

(v) ψ-Ξ1 -- “(ὦ -- 1)2 defined on 0<x<2. It is easily seen that a 
limit process exists at x=1 and that the final residue is 1. Hence 
Lim {1 — ,γί(α — 1)}=1. Notice that y=1 at x=1 also. 
a1 


(vi) y=2(x+1) defined on x>0. This is the usual kind of well- 
behaved function, the one used above to illustrate the limit process. 
For «= 2, y=6 and Lim y =6, at P on the curve in Fig. 9.6c. A similar 

x—>2 


result holds for other values of x, at other points on the curve. 
Of the three possibilities, I is illustrated by example (i), II by (ii) 
and III by the considerable range of cases of (iii)—(vi). 


9.7. Continuity. Some of the cases where a limit exists are of more 
interest, and more useful, than others. In particular, the last two 
cases (v) and (vi) of 9.6 can be separated off from the others, by the 
property that the limit of f(x) exists at x =a and that the limit is the 


256 LIMITS AND CONTINUITY [9 


same as f(«). It is this property which serves to describe what we 
mean by a ‘continuous’ function. So: 


DEFINITION: f(x) 1s continuous at x=« if (1) Lim f(x) exists, (2) 
f(a) is defined and (3) Lim f(x) =f(a). δὰ 


Continuity is a property of a function at a particular value, of a curve 
at a particular point. It represents the agreement between limit and 
value where both exist. As examples, (v) and (vi) are continuous at 
x=1 and x=2 respectively. On the other hand, (i)-(iv) are all dis- 
continuous at x=1, (i) because neither the function nor the limit is 
defined there, (ii) because there is no limit, (iii) because the function 
is not defined and (iv) because the limit and value (both existing) are 
different. As a natural extension, f (2) is continuous over the domain X 
if it is continuous for each x of X. Then (v) and (vi) are continuous 
everywhere; (i) and (111) are continuous except at x=1; (ii) and (iv) 
have each a countably infinite number of discontinuities. 

It must be noted that continuity is essentially a characteristic of 
the real number system. It is pointless to ask whether f(n), a function 
of integral n, is continuous or not. Hence a variable such as n (integral 
values) is called a discrete variable while x taking all real numbers 
(e.g. in an interval) is a continuous variable. The term continuum is 
sometimes applied to the set of all real numbers, to the set of all 
points on a directed line. 


In conclusion, we return to the question of whether Lim f(z) can 
t—>a 


be defined or found by means of a sequence of intervals or values of x 
converging on a. If the limit Z is known to exist, any sequence of 
neighbourhoods of « (or of particular values within them) which 
shrink down to « must define a sequence of intervals (or values) of 
f(x) which converges to L. This is necessarily so. But it is not sufficient 
to establish that the limit exists. A sequence is not enough when a 
strict development of the limit concept is attempted —or in 
practice when the existence of a limit is in question. It may well do 
in practice, however, when we are sure that the limit exists. The 
following calculations illustrate. 

Consider f(x) =x(x+1)atx=2. Write x=2-handx=2+k to get: 


f(2-—h) =(2 -h) (3 —h) =6 -- 5h +h?; 
f(2+k)=(24+k)(3+k)=64 5k - 13. 


7, 8] LIMITS AND CONTINUITY 257 


It is easily established that the neighbourhood [2 -- ἢ, 2+k] of x=2 
gives f(x) in the interval [6 -- 5h +h?, 64+5k+k?]. The limit process 
for x->2 is now expressed as letting ἢ and k take, quite separately, 
smaller and smaller positive real values. The limit is seen to exist 
and to be 6. Now short-circuit the process by taking intervals 
[2 -- ἢ, 2- Δ] and by specifying only a sequence of h’s, by writing 


b=, where n=1, 2, 3, ... . So: 


I 5 1 1 5 1 
ΩΣ ΞΟ 922) =64> 42. 
f (2 τὶ 6 tes ; γ( Ἐπ) 61 1 
Both tend to 6 as n->oo. Hence, if the limit of f(x) as x2 is known 
to exist, then it is 6. But this sequential process does not establish 
that the limit exists. It is easy to produce a case where no limit exists 
but where a sequential process suggests one. Consider the function 


y=1 (x rational) and y= —1 (2 irrational) at, say, x=2. Then 
f (2 - Η =f (2 +5] =I (all n) since 2 + ᾿ are rational. Both have limit 


1 as n—>00, suggesting that f(2)—1 as x2. There is, in fact, no such 
limit. 


9.8. Properties of limits and continuity. If limits exist for several 
functions f(x), g(x), ... at =a, it can be shown that the limit of a 
combination of the functions is the combination of the separate 
limits. Writing Lim for Lim: 

Lim {f(x) +g (x)} =Lim f(x) + Lim g(z); 

Lim {f(x) —g(x)}=Lim f(x) -- Lim g(x); 

Lim {f(x) x g(x)}=Lim f(x) x Lim g(x); 
f(x) _ Lim f(z) 
g(x) = Lim g(x) 
Here it must be checked that the domains of both f(x) and g(x) are 
appropriate to the definition of limits as v—>«, e.g. that f(x) and 


g (x) are defined on an interval containing «. Further, for a composite 
function: 


Lim for Lim g(x)<40. 


Lim F{f(z)} = F{Lim f(2)} 
provided that F' (uw) is defined on an interval containing u=L, where 
L=Lim f(z). 


258 LIMITS AND CONTINUITY [9 


The proofs of these results follow from the definition of a limit but 
they need to be laid out formally. One proof, that for the sum result, 
is as follows. Suppose that f(x) and g(x) are defined on some interval 
containing «, f(x) converging to L and g(x) to L’ as x>«. The same 
set of stages V can be used for each: N =[a,, ὃ, as a neighbourhood 
of « on which f(x) and g(x) are defined. To W there corresponds a 
smallest interval F(N)=[c,, d,] containing f(x) for x in N, and a 
smallest interval G (NV) =[c,’, d,’] containing g(x) for x in N. There is 
also a smallest interval H (Νὴ for values of h(x) =f (x) +g (x) for x in 
N. Then H (JN) is contained in the interval [c, +c,', dn +d,’] formed 
from Κ᾽ (Δ) and G(N). But ΜΚ (Δ) has final residue LZ and G(N) final 
residue L’, so that [c,+¢,’,d,+d,'] converges to the single value 
L+L'. Hence, there is for the function h(x) a limit process H(N) 
over stages N and the final residue is ἢ. + L’. Lim h(x) exists and it is 
L+L’, ie. Lim {f(x) +9 (x)} =Lim f(x) + Lim g(x). Q.E.D. 

There are corresponding results for continuity. If f(x) and g(x) are 


continuous at x=a, then f(x)+g(zx), f(x) -- σ (α), f(x) xg(x) and oe ᾿ 


are all continuous at x ΞΞ- α. In the last case, 9 (α) τέ Ο must be assumed. 
Further, if y= (u) is continuous at u=f(«) and if w=f(x) is con- 
tinuous at x=«, then the composite function y=F{f(x)} is con- 
tinuous at x=«. Similar results hold for functions which are con- 
tinuous over a domain X, i.e. continuous at each x of X. 

These results follow from those for limits and only the last causes 
any trouble. Write L=f(«) and M=F(L)=F{f(a)}. Then we job 
backwards as follows. Using the conditions for a limit (theorem of 
9.6), we assign an interval [M —c«, M +e] for values of y, where we 
take ε as small as we please. Since y= F'(u) is continuous at u=JL, 
corresponding to y=M, there is an interval (neighbourhood) N’ of u 
around £ such that the corresponding values of y=F'(u) are con- 
tained in [M—-e, M+e]. Since w=f(x) is continuous at 7=«, corre- 
sponding to u=JL, there is an interval (neighbourhood) N of zx 
around « such that the values of τ -- [(4} are contained in N’. Fig. 
9.8a illustrates. Hence, no matter how small ε, there is a neighbour- 
hood WN of « so that M-—e<F{f(x)}<M +e for all x of N. By the 
theorem of 9.6 again, F'{f(x)} converges to M = F{f(«)} as x—«, i.e. 
F{f(x)} is continuous at x=. Q.E.D. 
In dealing with continuous functions, e.g. in problems of maxima 


8] LIMITS AND CONTINUITY 259 


and minima, we make constant use of the following result. The 
property is an important one but very simple; it is, indeed, so simple 
that we tend to accept it without question. However, it is something 
which needs to be established; since the proof is somewhat tricky 
and tedious, it is left to 15.4 below. The 

result is: y 

THEOREM: If y=f (x) ts continuous over ΜΈΕ]π  ππππτστττττοττν 
the interval [a, Ὁ], then the range of y 18 M 
itself an interval [c, d]. M-e 
It is essential to appreciate the import of 
this property of continuous functions. It 
is not only that the values of y are en- 
compassed by an interval [c,d], but also 
that y ranges over the whole interval. For 
any x in [a, Ὁ], the corresponding value of 
y is in [c, 4]. Conversely, if y is any value 
in [c, d], then there is some 2x in [a, ὁ] 
which corresponds. Fig. 9.85 illustrates. 
Two particular cases are important in 
themselves. They refer to the attainment 
of smallest and largest values, and to the 
consequences of a change in sign, of f (x) over [a, δ]. See 9.9 Exs. 31 
and 32. 

Fig. 9.8b indicates that particular attention should be paid to 
functions which are both continuous and increasing (or decreasing). 
If y =f (x) is an increasing function defined on the domain A and with 
range B, then the inverse y=f-1!(x) exists and it is an increasing 
function defined on the domain B and with range A. The sets A and 
B need not be intervals and the functions (5) and f-1(x) need not be 
continuous (see Fig. 9.3). As a special case, we have the following 
useful result, for an increasing function as in (iii) of Fig. 9.86: 

THEOREM: If y=f (x) 18 increasing and continuous over the interval 
[a, δ], then 

(1) the range of f(x) 18 the interval [c, d], where c=f(a) and d=f(b); 

(2) the inverse y=f-1(x) exists, an increasing and continuous 

function over the interval [c,d], with the «interval ἴα, ὃ] as 
range. 
This follows from the preceding theorem. A similar result, with 


esse Re ὦ πα πο ὦ ἂν es Ὁ τ ὦ τ ἂν ὧν Ὁ 


260 LIMITS AND CONTINUITY [9 


obvious variations as in (iv) of Fig. 9.80, holds for a decreasing and 
continuous function. 

ΤῸ conclude with some words of encouragement: it is necessary to 
have a strict definition of a limit and hence of continuity of a function, 
both to serve as a sound theoretical basis for the calculus and also as a 


see qe eae eo sea ἣν ὧν eeaae 


dawanenanwnana 


Qn joe ὧ ὦ» σὰ a OO ὧν τῷ ee δὰ, ὧν πο 


yee 


x 


Fia. 9.8b 


practical guide to the range of possibilities which can occur. How- 
ever, in practice, the ‘odd case out’ is usually clear enough, as at the 
end of 9.7. Further, the writing of limits of simple functions is easy 
enough in practice. More complicated functions need to be split into 
combinations of simple functions; the results above then apply. 
An illustration is given in 9.9 Ex. 29. Hence the final point: the 
practice of limit evaluation is not at all difficult. 


9.9. Exercises 


1. Show that y=1 +2? can be defined for all x with range y 1. Represent 
graphically as a locus. 
2. Functions in parametric form. Illustrate the use of parameters by showing 


2 
RET ORT Oe aca i. (i) and (ii) of 
ot? + Bar + 


b+eNz — 
9.2, and that (iii) of 9.2 can be included in y OPT OT ONE --α 


that the general ratio of quadratics y= 


Vx --β 


9] LIMITS AND CONTINUITY 261 


3. Extend the notion of parametric classing of functions by showing that a 
more general step-function than (v) of 9.2 is y=A, for ra<a<(r+l)a (r=0, 
| ie re 

4. The parabola. Show that y =x?, defined for all x, is a two-one mapping 
with range y>0. Limit the domain to x20 and show that the mapping is 
one-one. Write the inverse function and show that both the function and 
inverse are increasing. Generalise to y =ax? (parameter a#0). See 8.9 Ex. 21 
and 22. 


2_ 72 
5. Show that y, a ae ea 


and y, =x +2a take the same values for all x 


except at x =0 where y, = 2a but y, is not defined. 

6. Functions as mappings. Two ways of mapping a function (7.3 and 8.6) 
are mentioned in 9.2. Link by showing that y =f (x) as a curve effects a mapping 
of points on one line Oz onto points on another line Oy (at right angles) as in 
Fig. 9.9a. 

7. Exhibit function (ii) of 9.2 as a composite 
function F{f(x)} where f(x) =2z?. Put function (iv) of 
9.2 into the same form. Specify the domains and 
ranges concerned. 

8. By writing as functions of functions and checking 


3 
domains and ranges, show that y = oe J1+2 is 


defined for x>-1 but that y= —-Nl+a only 
for -l<a<0. 

9. Illustrate a practical use of functions by con- 
sidering the following. A firm makes open tin cans in Fia. 9.9a 
the form of a cylinder of height 1 foot and of variable 
radius x feet. The cost of production is £y per 1000 cans where y=./u and 
where wu is the surface area of the can in square feet. Show that, as a function 


of a function, y =V x(x +2) for x>0. 

10. If 2 and y each take all values in the interval [--1, 1], the relation 
2?+y?=1 is shown by a circle. Show that no function is defined on this 
domain. Restrict the set from which x or y is drawn in various ways to get y 
as a function of x or inversely, indicating which segments of the circle you use. 
If the domain of x must be an interval, show that a one-one relation (and so a 
function and inverse) arises only if the interval is some sub-interval either of 
{[ —1, 0] or of [0, 1]. 

11. Show that the linear function y=mx+c (m#0) is monotonic (either 
increasing or decreasing) and hence can be inverted. 

12. Show that y =./x? -- 1 is defined for all x as a two-one relation, but that 
it is a decreasing function on «<0 and an increasing function on 720. 


; hea 
13. Graph y ate defined on x >1. Show that the inverse is z= “τ 

= yt 
defined on y< -—1, both functions being increasing. 


14. Show that the function y=i2 (0<a”<1), =3(a-1) (2<a” <4) is in- 


262 LIMITS AND CONTINUITY [9 


creasing, but that y=}v (0<a<l1), =}0 -2 (2<%<4) is not. On the other 
hand, show that both have inverses, as (ii) and (iv) of Fig. 9.3. 

15. Strictly increasing functions. The definition of 9.3 is that of a strictly 
increasing function, excluding all step-functions. Weaken to: f(x) is increasing 
if a<b implies f(a) < f(b) and show that step-functions are included. 


16. Show that 5 : ae and 70 as n—>co. Represent graphically and 


show that the tendency is steady (through positive and negative values 
respectively), provided that n>2 in the second case. Show further that, 

as N—>00: 
2+" 2-n 
——_—--—>»} 3 ----- 
l+n l+n 

17. Generalise Ex. 16 by taking 

f (XL) = (Gye? + 1H * Ἐν.) (mgt Ἕας 1S? +...) 


_24+n 2-n 


vey eg 


-l 


where a, #0, «,#0. As n—>oo, show that f(n)—0 if r<s, that f(n)— if r=s 
and that there is no limit process, f(n)-~+40, if r>s. ° 

18. Illustrate the case of a limit process but no limit by showing that 
f(n) =(4)" +(—1)” tends to oscillate between +1 as n—00. 

19. Geometric series. Establish formally that a+ar+ar?+...+ar™" has 
sum S, =a if |r|<1, and that S,—--+00 if |r|>1, as noo. 
What can be said of the cases where r= +1? 

20. Arithmetic progression. Show that there is no limit process for the 

sum of n terms of a+(a@+a)+(a@+2a)+...: S,=3n{2a+(n—-l)a}—-4o0 88 
nN—>00. ) 
21, If f(n) =F{1 -- ἰ -- 3325}, show that $-—«</f(n)<#$+.e for all n>7 given 
«=0-1 but that we must take n >15 if «=0-01 is given. On the other hand, if 
J (n) =1/n?, the convergence (to zero) is more rapid. Show that, given ε, then 
n >1/,/e suffices for 0<f(n) <e, 6.5. «=0-1, 2,54; «=0-01, n2=10. 

*22. Area of a circle. The Euclidean definition of the area of a circle is the 
limit of the area of a regular inscribed polygon of n sides (as n—>oo). Hence, 


for a circle of radius r and area A: A =Lim inr? sin saa (see 2.9 Ex. 8). It is 
n> 


still not established that the limit exists, still less what it is. But, given 
sl as x—>0 (x in radians, 180°=7 radians), show that (n/27) sin 27/n—1 
as n—>oo and so A =awr?. Archimedes (circa 250 B.c.) worked out 7 as between 
223/71 and 22/7 for n=96; he was probably the only mathematician before 
Newton and Leibniz in the seventeenth century with any real idea of a limit. 

23. Show that the limit process for f(n) as n increases without bound 
through integers (9.5) extends to the limit process of 9.6 for [(2) as x increases 
without bound through real values and that the same three possibilities arise: 

no limit process or f(#)—>+00; no limit; f(z) —L as 2-00. 


9] LIMITS AND CONTINUITY 263 


24. As x» -1, find whether there are limits of (i) y=1+2; (ii) y=1-2; 
(iii) y=(1-2*)((l+x); (iv) y=1f(1+2); (v) y=1/(1-2), and (vi) 
y =(1+2)/(1 —2?). What is the difference between the functions (ii) and (111), 
᾿ and between (v) and (vi)? 

25. Show that the function y = -- (a< -1);y=2(-l<a<1);y=1 (5 1) 
has a graph as in Fig. 9.96. Examine from the point of view of limits and 
continuity at c=+1. 

26. For the functions of Ex. 5, show that y, and 
y, both have limit 2a as x0. Why is y, (unlike y,) 
discontinuous at «=0? 

27. Show that the function y =z(0 <a <1);y=a-1 
(2<xa<3) has inverse στ (O<y<l); x=y+l 
(l<y <2). Represent graphically. Why are these 
functions (though defined) not continuous at x=1 
and y =1 respectively? 

28. If f(x) is continuous over [α, ὃ], can we say 
that y =f(x) has inverse x =f~1(y) over [a, δ] if and 
only if f(z) is monotonic? See Fig. 9.3. 

29. Express f(x) =2//(1+2?) as a function of wu where w=1+1/x*-1 
as x00. Deduce that f(z)—1 as x—>oo. 

30. Alternatively, put z=1/h and show that 

“ΝΜ (1 +22) =1//(h? -Ὁ 1)-»1 as h--0 so that x//(1 +x?)—>1 as x0 

31. Weierstrass’ Theorem. Establish from the theorem of 9.8 that: ὑὉ 

If f (z) is continuous over [a, ὃ] then f (x) attains its smallest value Inf f (x) 
at some « (a<a<b) and similarly for its largest value Sup f (x). Illustrate by 
reference to Fig. 9.8b and check graphically that y =x? — 2x2 —1 over [0, 3] has 
its smallest value at «=1 and its largest at 7 =3. 

32. Bolzano’s Theorem. Establish further that: 

If f («) is continuous over [a, 6] and such that f (a) <0 and ἢ (b)>0 then 
f («) =0 for some a (a<a<b). 

Illustrate graphically. Check that x? -- 2x -- 1 =0 has a root between 2 and 3. 

*33.The field of convergent sequences of rationals. A sequence of rational values 
αι, Gy, --. Gy, --. is convergent if a,>a as n>00, where a is some real value. Use 
the theorem of 9.5 to express the condition for convergence: the sequence is 
convergent if and only if, given a positive rational ε (however small), there is 
an integer n such that | a, —a@, | > for all integers p and g>n. So convergent 
sequences of rationals can be handled without reference to their limits, 1.6. 
without reference to real numbers. In particular, they can be shown to be a 
field. This is the basis of Cantor’s definition of a real number as a convergent 
sequence of rationals, an alternative to Dedekind’s definition (2.4 and 15.1). 


Fia. 9.96 


CHAPTER 10 


CALCULUS 


10.1. Some examples. A few simple examples serve to indicate both 
the nature of the main concepts of the calculus and their great range 
of applications. 

(i) Speed, velocity and rate of growth. A car is driven steadily at 
60 m.p.h., travelling 88 feet every second. What is the distance x feet 
covered in ¢ seconds? The answer is easy: x = 88¢. Hence, in this par- 
ticular case of constant speed, we have the following simple functions 
of time: 


Distance covered at time ¢: x =88t; Speed at time ἐ: v= 88. 


Here ¢ is in seconds (from some starting time), z in feet, v in feet per 
second. 

Suppose, now, that the car is driven at a varying speed. Then there 
arise such questions as: what is the speed when + mile has been 
travelled? Or, how long does it take the car to accelerate from rest to 
60 m.p.h.? Whatever the speed may be, suppose observations are 
made of the distance 2 feet travelled for various elapsed times of ¢ 
seconds and suppose they disclose that x = 2,3, distance as a particular 
function of time. Select some elapsed time ἐρ seconds and perform the 
simple calculations: 

Distance travelled in 10 seconds from f, 


= 2 (ty +10)? — 242 = 2t2 + 40t, + 200 — 242 = (40, + 200) feet. 
0 0 0 


Average speed over 10 seconds from t,=(40t, + 200)/10 = (4t, + 20) 
feet per second. Other such calculations can be made and summarised, 
p- 265. There is a practical difficulty which we have glossed over. We 
have said that observations ‘disclose’ that x= 2:32, precisely; surely 
they do no such thing? Actual readings may be limited to multiples 
of 1/10th second and 1/10th foot. Then, strictly, we can do no more 
than get to the third row of the table below. However, a simple fact 


11 CALCULUS 265 


Average speed u 
Time ean 
interval ft. per sec. over time of 
[fo, t + 10] u=4t, +20 10 secs. 


[ép, tp + 0-1] u=4t, +02 0-1 sec. 


[top to th] | w=4t,+2h h secs. 


stands out: the limiting speed or velocity* is 4¢,. We want to take 
up the position that time ¢ varies continuously, that distance x varies 
continuously, that they are related by 2=2é? precisely, and that 
velocity v is given by: v=4t. Such a step from a practical situation 
to a theoretical one is one commonly taken but it must be recognised 
for what it is. In practice, we measure time (say) to 1/10th second; 
conceptually, we have no qualms in assuming time varies con- 
‘tinuously. The same thing is true of distance or velocity. There is 
then no residual difficulty in assuming a precise formula: x= 22 or 
v=4t. Hence, provided that we make the appropriate assumptions, 
we take the big step forward and write Lim u=v. We call v the 
—>0 


instantaneous speed or velocity at the specified elapsed time. So, at 
elapsed time f,: 


Average speed over [f , ἔρ +h]=4t)+ 2h; Velocity at ty =4to. 
The first depends on ἐρ and h, the second on é, only. Hence, at time ¢: 
Distance travelled z= 2¢?; Velocity v =4t 


dropping the subscript 0. There are two functions of time ¢ (in 
seconds elapsed), one for x (in feet) and the other for v (in feet per 
second). The second function is derived from the first; it is an 
example of a ‘derivative’. 

The question we put can now be answered. After time f, the car 
has accelerated to »=60 m.p.h.=88 feet per second. What is ἐΐ 
Answer: 4t=v =88, i.e. t=22. The distance travelled is then 


= 2 (22)? = 968. 
* Strictly, velocity is a vector quantity with magnitudes and direction, where 


‘magnitude’ corresponds to the length of a geometric vector (8.4). Here we speak only 
of magnitude, e.g. for movement along a straight path. 


266 CALCULUS [10 


Hence, the car reaches 60 m.p.h. after travelling 968 feet in 22 
seconds. This is on the basis of the observations from which the for- 
mula x = 2¢ is obtained. 

The functions can be plotted on two graphs (Fig. 10.1a). The first 
curve shows how the distance x travelled increases with time ¢; after 
elapsed time ON =#, the distance is N P = 243. 
The second curve (a line) shows how velocity 
v increases with time ¢; after elapsed time 
ON =t, the velocity is NQ=4t. The second 
curve is here derived from the first. Can we 
reverse the process, i.e. given the velocity 
in terms of time, can we find the distance 
travelled? We have, at least, a hint of how 


to obtain this ‘anti-derivative’, of how to per- 
a form this ‘integration’. Given the point Q 
on the graph of the velocity (v=4t at 2), 
. calculate: 
ὃ Area of triangle ONQ 
Ἣν οἱ =10N x NQ=Hx 4t=20 
Fie. 10.1α 


which we recognise as the height of the graph 
of distance at the same ¢. Hence, in this case, the height NP of the 
distance curve (x = 2? at ἐ) is obtained as the area under the velocity 
line from O to t. This also fits in with the dimensions of the concepts 
involved: distance travelled is speed x time. That is: we expect the 
height of the distance graph to be related to areas on the velocity graph. 
This is the fascinating relationship which will concern us later on. 
Other such examples can be quoted. To take one more, suppose x 
is the number (thousands) unemployed after ¢ months from some 
starting date and suppose observations disclose that x =2é?. There is 
the same problem of the step from practice to theory, since the num- 
bers unemployed may only be recorded monthly, for ¢=1, 2, 3,.... 
There is, again, no conceptual difficulty in assuming that unemploy- 
ment varies continuously over time. Making this assumption and 
performing the same calculations as before, we get the average rate 
of growth of unemployment over ὦ months from month ¢ as 4¢ + 2h, 
and hence the (instantaneous) rate of growth at month ¢ as 4¢. Hence, 
after ἐ months: 


1] CALCULUS 267 


Unemployed x = 2# (thousands) ; 
Rate of growth v =4¢ (thousands per month). 


For example, after 10 months, the number unemployed is 200,000, 
growing at the rate of 40,000 per month. The rate of growth is 
the ‘derivative’ of the number; the number is the ‘anti-derivative’ 
or ‘integral’ of the rate of growth. 

(ii) Average and marginal rate of change. A firm produces gadgets 
at a given average (unit) cost of £20 each in a plant with a capacity 
of 100 gadgets per week. Total cost is £202 (0<2<100) for a weekly 
output of x gadgets. This is the particular case of fixed average cost. 
On the other hand, suppose that total cost £y varies with output 
according to the formula: y=2z?. What can be said about the cost 
of getting additional output? If weekly output is running at 2 
gadgets, then the average (unit) cost of A more gadgets per week is 
obtained as before. It is τὸ of the table above, with ἐρ replaced by 2p, 
with seconds replaced by gadgets per week, and with w in terms of 
£ per gadget. As h->0, u->v and v is called the marginal cost, i.e. the 
marginal rate of change of total cost: 


Average cost over [%, %)» +h]=42,+ 2h; Marginal cost at ΖΞ 425. 


From the given function (total cost y=2z*) is derived a second 
function: marginal cost v=4z. Here y in £ and v in £ per gadget are 
functions of the weekly output x gadgets. Plotting as graphs as 
above, we obtain the total cost curve (y=2z*) and the marginal 
cost curve (v= 42): the second is the ‘derivative’ of the first and the 
first is got by ‘integration’ from the second. 

Further such examples can be got from other fields. Suppose a 
roller is pushed up a lawn of increasing slope. Then it may be found 
that the force v lbs. against the roller increases with the distance x 
feet travelled, say by the formula v=4z. The work done y foot-lbs. 
also increases with the distance travelled: y=2z?,. The relationship 
here is the same as that just considered. The force is the marginal 
rate of change of work done. From the work done graph (y= 22?) is 
‘derived’ the graph of the force operating (v=4z) and, conversely, 
the graph of the force can be ‘integrated’ to give the work done. 

(iti) Slope of chord and tangent to a curve. We have considered a 
function y= 2z? and we have derived a second function v=4z2. If x 
is time, we interpret v as the instantaneous rate of growth of y over 


268 CALCULUS [10 


time; if 2 is some physical variable (e.g. distance or output), we 
interpret v as the marginal rate of change of y with respect to z. Now 
concentrate attention on the curve which represents the function 
y = 2x7 in the plane Oxy, as shown in Fig. 10.1b. 

Let P be the point on the curve © 
at 2, and Q at χο +h so that OM =2,, 
MN=h. Then the chord PQ has 
slope: 

RQ  NQ-NR 
PR” MN 
NQ-MP 
ON -OM 
2 (Xo +h)? — 2x5 
(to +h) —2p5 


Fia. 10.16 = 


1.0. Slope οἵ PQ=42,+2h (average rate of change). 


As h-0,Q approaches P and the chord PQ tends to a limiting posi- 
tion, the tangent PT at P. So: 


Slope of PT =42x, (marginal rate of change). 


To summarise, from a function such as y = 2z?, we have derived a 
second function v = 4x. Here v is the marginal rate of change of y with 
respect to x or (if x happens to be time) the instantaneous rate of 
growth of y over time. Further, v measures the slope of the tangent 
to the curve y = 22? at the appropriate point P. A study of v, as the 
‘derivative’ of y, will clearly be rewarding in terms of the range of its 
applications. 


10.2. Derivatives. The calculus is based on the assumption that a 
function y =/(x) is a continuous real-valued function of a continuous 
real variable. The domain may be all x or some ‘half’ set such as 
x>0, but at least it should include some interval [a, ὃ] where a<b. 
We take x as varying through all real numbers in an interval 
(a<xv<b) and y as varying continuously over the interval. As an 
exception, we may allow the function to have one or more isolated 
discontinuities, e.g. y=1/(1-—2) which is discontinuous (and not 
defined) at x=1. The first concept of the calculus is the ‘derivative’, 
arising out of the considerations of 10.1: 


2] CALCULUS 269 
Derinition: The function f(x) has derivative at x=2,: 


h) -- 
ti (x5) — Lim f(%o+ ᾿ f (Xo) 
λ--ὸ 
of the limit exists and the derived function (9! (5) is defined on the domain 


of real values of x for which derivatives exist. 


The following remarks are needed in amplification of the definition. 
If x is given, within the domain of x, then the expression 


F (Xo +h) —f (a9) 


————  . ..... 


h 


is the average rate of change of f(x) over the interval [2,, 2, +h] of 
varying length h. But h must be regarded as taking both positive 
and negative real numerical values; the interval, in fact, is either 
[%, Χο +h] if h>0, or [χρ -- (-- Δ), xo] if h<0. On the other hand, the 
expression is not defined for h=0. Hence it is a function of ἢ, not 
defined for h=0, but defined at all other points in a neighbourhood 
of h=0. If the expression has a limit as h—>0 through all real values, 
i.e. through both positive and negative real h, then we get the deriva- 
tive f’ (29) as the limit of the average rate of change of f(x) from 
%=2y. This is the marginal rate of change of f(x) at x=2,. (If there is 
no limit, there is no derivative and no marginal rate of change.) As a 
definition of rates of change and an interpretation of a derivative: 


Derinition: 1} f(x) has derivative f' (a9), the marginal rate of change 
of f(x) at x=x, is the limit (as h->0) of the average rate of change 
fier” a ar te) oad τί is measured by f' (x). 

When there is no ambiguity, f’ (x) can be called the rate of change of 
f(a). 

The essential point about a derivative is that it asswmes continuous 
variation, leaving a gap between theory and practice. In practice, x 
takes discrete values, e.g. time in 1/10ths seconds, which can be 
recorded. Only the average rate of change of f(x) is relevant. This 
may be all we need. But, usually, we assume that x is capable of 
continuous variation and that y varies continuously with x. We can 
then write the marginal rate of change of f(x), the derivative Γ΄ (x). 
This is generally accepted procedure, e.g. for distance varying 
continuously with time as in (i), or for cost varying continuously with 


270 CALCULUS [10 


output as in (ii) of 10.1. The fact that the assumption is made should 
not be overlooked. 
Various notations for a derivative of y =f (x) are in common use and 
it is convenient to have them. Each can be applied to y or to f(z): 
(i)y’ or f’(x) following Lagrange (1736-1813) | 
(ii) Dy or D,f(x) following Cauchy (1789-1857) | 


wesy @ d ae 
(111) a '§ τὶ f(z) following Leibniz (1646-1716). 


Of these, (i) is quite appropriate when the function is unspecified, 
while (ii) is particularly useful for a particular function, e.g. 
1), (2x?) = 4a. 

The symbol D,, or more simply D when there is no ambiguity, is to be 
regarded as an operator. Thus D (22?) = 4% means that 42 is the func- 
tion obtained by operating on 2x? by writing its derivative at each x. 
To indicate the derivative, the result of the operator D, at a parti- 
cular value 2), we can write [D(2z?)],,=4a,. The oldest of the 
notations is (iii) and it can be quite safely used if it is understood that 


¢ 9 


d : 
1; is a single symbol, the operator D = ae However, the notation 


used to be employed, and still is sometimes, in the form = described 


d. 
as a “differential coefficient’ ; the danger here is that a gets separated 
into the ratio of ‘dy’ to ‘dx’, which is without meaning (at least in the 
present context). ἮΝ 
The following is an important result: 
THEOREM: A necessary condition that f’ (x9) exists is that f(x) is 
continuous at x =o, 1.€. if f’ (a) exists, then f (x) is continuous at x Ξε χη. 


The proof involves the properties of limits (9.8). If f’ (25) exists, 


write x=2,+h and let r->x, (h-0): ey (2°) aS L—>Xy. 
τ %Q 


Write: F (x) 1 (w — χορ) +f (Xp). 


χ-. 
Hence F (x)=f(x), except that F(x) is not defined, whereas f(z) is, 
at x=2,. Take the limit of F (x) as x2, (which does not involve any 
value at x=2;,): 

F (x) —f" (Xo) x 0+ f (%o) =f (Xo). 


‘Hence f(x) (2,) also, i.e. f(x) is continuous at 2 =2y. Q.E.D. 


2] | CALCULUS 271 


It must be stressed that the condition is necessary, i.e. whenever 
7΄ (Ὁ) exists, f(x) is continuous. It is not sufficient, i.e. if f(x) is con- 
tinuous, then f’(x) may or may not exist. To illustrate, consider 

f(e)=1-J{(1-#)}=1-(1-2)=" (1) 
=1-(e¢-1)=2-2(x>1) 
Then for variation around z=1: 
ate = 1 (hk<0) and -1 (h>0). 
There is no limit as h->0 through positive and negative values (10.9 
Ex. 3). The function f(x) is continuous but without derivative at 
ἄ ΞΘ 1: 

The graphical version of a derivative makes the position clear. 
If P and ᾧ are the points x=2, and =x, +h respectively on the 
curve y=f(x), then: 


Slope of PQ = ee (A<0 or h>0) 


as in 10.1, (iii) above. Hence: 
DrFiniTion: The tangent PT to the curve y=f (x), at P where 


1 =i», is the line through P with slope at Πδὴ ἘΠ 
30 
the derivative exists. Otherwise there is no tangent at P. 


The necessary condition is that, if a tangent exists at P, then the 
curve is continuous at P. The condition is not sufficient, i.e. if the 
curve is continuous at P, then there may or may not be a tangent 
at P. The case of failure occurs when the curve has a sharp point. 
In Fig. 10.2, the function (defined for all 2>0) is discontinuous at 
x=1 and continuous but without 
derivative at x=2.The curve jumps 9 P, 
at the point P, of discontinuity; it 
does not jump, but the tangent 
does, at the sharp point P,. To the 
left of P,, the tangent slopes up- o 
wards; to the right, it slopes down- 
wards. 

Hence, even when f(x) is continuous, it is necessary to ensure that 
7 (x) exists before it is written. The next result is for a composite 
function: 


ab ὃ, τὸ αν ὧν αὖ & αν πο ὦ» αν 
Ιωμιουσοουσα στα ον ον. 


Fig. 10.2 


272 CALCULUS [10 
_ Turorem: If f(x) has derivative Κ΄ ( 9) and if F(u) has derivative 
F' (ug) at Uy=f (x), then the composite function F{f(x)} has derivative 
at χη: 
[π΄ (4) Iuo=stae) XF° (Xo) 
1.e. DF {f(x)} =D,F (u) x D,f (x) where u=f (x). 
Proof: Write w,=/f(%,) and u)+k=f(% +h) so that : oe er A ; 
Then: | 
F{f (ty +h)} — F{F (%o)} _ F (Uy +k) — F (uo) f (Xo +h) —f (Xp) 
h k h 
From the properties of limits and since k—>0 as h->0: 


Lim F{ f(x) +h)} - ῬΑ ΘΟ τ μη F (Uy +k) -- F (up) 
h—0 h k—>0 k 
“Lim ιν ἢ =f eo) 
h->0 h 
i.e. DF {f (x)} =D,F (u) x Daf (x) at x=2Xp. Q.E.D. 


CoroLuary: If y=f (x) is continuous and increasing with derivative 
D, f(x) at “0, so that x =f-1(y) exists, is continuous and increasing, then: 


D,f>(y)= 55 at ys=f(x)) provided D,f(x)A0. 


Proof: we have x=f—(y) =f-{f(x)} ie. D, (a) =D,f-{f (a)}. 

By the theorem, D,f-{f(x)}=D,f7 [y) x Dif (x). Also D,(x)=1 (see 
10.3 below). Hence D,f-1(y) x D,f(z)=1, which proves the corollary. 
Clearly it holds equally if f(x) is a decreasing function. 


10.3. Operational rules for derivatives. Two situations occur in 
handling derivatives. The derivative may be needed for a function 
which is unspecified but written as a combination of other functions. 
Here we seek rules for the derivative of the function in terms of the 
derivatives of the separate components. In the other situation, the 
derivative is required for a particular function, e.g. a specified 
algebraic function. The problem is: given a particular function, write 
another particular function, the second being the derived function 
of the first. Here we need to have ready to hand a list of the deriva- 
tives, called standard forms, of the simplest particular functions; we 


9] CALCULUS 273 


rely upon the rules to give derivatives of more complicated functions. 
As a preliminary, two very special derivatives can be written: 
D(constant)=0 and D(zx)=1. 
These follow from the definition and they are evident from the 
geometric point of view. The line y=constant, parallel to Oz, is its 
own tangent and it has zero slope. The line y =z, also its own tangent, 
has unit slope. 

The first task is to establish, from the definition, rules for the 
derivatives of combinations of functions. The obvious rules to obtain 
are those which deal with sums, differences, products and quotients. 
The proofs of these rules are all similar and sufficiently illustrated 
by the product rule: If f(x) and g(x) are two functions with deriva- 
tives, write F(x) =f(x)g (x) so that 

F(e+h)-F(e) _ fle+h)g (x +h)-f (wg (@) 
h h 


Ff (@)g' (x) +9 (x)f' 6) as h—0 
_ Le. #' (x) exists and equals f(x)g’ (x) +9 (zx)f’ (a). 
The rules are assembled in the following table, together with those 


for composite and inverse functions (last theorem and corollary, © 
10.2 above). 


Rule 


Function Derivative 


Sum [τσ | f'(e) +9" (a) 

Difference [(α) -g (x) (α) -- σ' (x) 

Product Ff (x)g (x) F(x) 9" (x) +9 (x) Γ΄ (a) 

Quotient fo pO) Tee) if g (x) #0 
Composite F{f (x)} Κ΄ (u)f' (x) where u =f (a) 
Inverse f7(x) FG if f’ (y) #0 


eee 
All the derivatives written are assumed to exist. In the last case, 
f(y) is assumed to be increasing (or decreasing) with derivative 7 ly), 
so that the inverse f-!(x) is also increasing (or decreasing) with the 
derivative shown. 

Some particular cases arise from the fact that the derivative of a 


274 CALCULUS [10 


constant is zero. If A and k are constants, f(z)+A has derivative 
f' (x) and kf(x) has derivative kf’ (x). Write f(x)=1, f’(x)=0 in the 
quotient rule to give the derivative of a reciprocal: 


ae has derivative —2 (7) 

g (2) g (x)? 
The rules are of such practical use that it is worth while repeating 
them in an alternative notation. The particular cases are included. 
Here u and v are two functions which possess derivatives; in the 
composite rule, y is a function of u and wu is a function of x, both with 
derivatives. 


if σ(α)τξ0. 


Rule Derivative 
Additive constant D(u+A)=Du (A constant) 
Sum D(u+v)=Du+Dv 
Difference D(u —v) =Du -- Dv 
Multiplicative constant D (ku) =kDu (k constant) 
Product D (uv) =uDv +vDu 
Reciprocal D- = a if v #0 
Quotient D- Ξε aa if v +0 
Composite Dy =D yD u 

1 : 

Inverse Dy = Το if Dy 0 


The second task is to establish, from the definition, the derivatives 
of the simplest functions. At the moment, for algebraic functions, 
the only simple function to consider is 27, where r is a rational 
exponent. Indeed, we can get by with the two special derivatives: 
D(constant)=0 and D(x)=1. As quite simple exercises in the 
application of the rules, the steps are as follows (10.9 Ex. 5 and 6). 
First, by the product rule and mathematical induction from D (x) =1, 
we obtain D(2")=n2"-1, n a positive integer. Next, by the particular 


1 
case of the quotient rule: D (=) = -- 7 Then, by the composite func- 


] 1 
tion rule, with w=-—, we find: D(x-*) =D(=) i be Sse nae +), 
x x" get 
Further, write y= /x so that x=y? with derivative D,z=2y; the 
| ae ; 
inverse function rule gives ἢ (./x) =.—, i.e. D (x*) = $a. The deriva- 


2,/x 


9] CALCULUS 275 


tives of other surds, in general x?/*=(%/x)?, are obtained similarly. 
All this boils down to one standard form: 

D(x")=ra'-) (r rational). 
With the rules, this is enough for the derivative of any algebraic 
function. 

The fact that D (constant) =0 is important in another connection. 
The problem so far is: given f(z), find f’ (x). Can the inverse problem 
also be solved? Given f(x), what function F(x) is such that 
1" (x) =f(x)? Tf such a F(x) can be found, it can be called an anti- 
derivative of f(x). Clearly, in some cases, but only in some cases, the 
rules and standard forms supply the anti-derivative, simply by 
operating them in reverse. So: 


If f(z) =ra’—}, then F (x)=2" has F’ (x) =f (zx) 
a if f(x) <2", then F(x) = ae has F" (x) =f(2). 


r+1 
An anti-derivative of 2* is τ τ Again, suppose that f(x) can be 


arranged in the form ¢ (α)ψ' (x) +4 (αὐ Ψ' (a), where ¢(x) and y(z) are 
two particular functions, then F(x)=¢(zx)s(zx) is an anti-derivative 
sought. For example: 


f(e)= a ΣΕ (( τ) (μα) + JxD(1+2). 
Hence are ee has derivative F’(x)=f(x) and an anti- 
derivative ape ἊΣ * is (1+2)./x. 


One question has been forgotten here: if F (x) can be found so that 
4" (x) =f (x), is F(x) unique? The answer is: not quite. Suppose that 
G(x) is also such that G” (x) =f(x). Then the derivative of G(x) ~ F(z) 
is Ο (x) — 1" (x)=f(x)-f(z)=0. The only function with a zero 
derivative everywhere is a constant; the only curve with a tangent 
everywhere parallel to Oz is a line parallel to Ox. Hence G(x) — F (ax) = 
constant, and the general function with derivative f(z) is: 


F(x)+A where F’ (x)=f(x) and A=arbitrary constant. 
In other words, if F(x) is any anti-derivative of f(x) which can be 


found, then the general anti-derivative is obtained by adding any 


K A.B.M. 


916 CALCULUS [10 


constant to F(x). This lack of uniqueness is important, as we shall 
see. It expresses the fact that, while multiplicative constants remain, 
additive constants disappear in derivation. 


10.4. Areas. The basic notion of an area is the product of two 
variables; a rectangle has area x x y where x and y are the lengths of 
adjacent sides. In elementary geometry, this leads to the area of a 
parallelogram (base x height) and of a triangle (ξ base x height), and 
so to the area of any closed figure bounded by lines. Something quite 
different is involved when an area is bounded by curves. The 
elementary approach is to approximate the area by inscribing some 
figure with lines as sides and by writing the area of the inscribed 
figure. The implicit assumption here is that the curvilinear area is the 
limit of the inscribed area as the fit gets closer. This is the idea behind 
_ the definition of the area of a circle in terms of inscribed (or circum- 
scribed) polygons (see 2.9 Ex. 8 and 9.9 Ex. 22). 

To make this more systematic, 
we start by assuming, provisionally, 
that a given figure does indeed have 
an area. This seems to be in order if 
the figure is bounded by continuous 
curves, or by segments of con- 
tinuous curves and lines which are 
joined continuously. Then we pro- 
ceed by two stages. At the first 

Fic. 10.4a stage, we insert a pair of co- 
ordinate axes and divide up the 
area investigated in the way shown in Fig. 10.4a: 


Area=ABL+LIBCM+MCD+NEA-NDE. 


Apart from triangles, the component areas are of the type LBCM, 
the area between a curve, the axis Ox and two vertical lines (parallel 
to Oy). The second stage is to obtain a measure of such an area, given 
the equation y =f (x) of the curve. 

As an actual case, consider the curve of Fig. 10.4), for the function 
y =2* defined on the interval [1, 2], and seek a measure of the area 
MPQN under the curve, above Ox and between MP (at x=1) and 
ΝΟ (at x=2). Divide MN into four equal segments, each of length Ζ, 


4] CALCULUS 277 


by inserting M,, M, and M,, withcorres-  y 
ponding P,, P, and P; on the curve. 
Complete the four rectangles shown, 
each being under the curve, so that the 
area A sought is greater than the sum 

of the areas of the four rectangles: 


A>MPQ,M,+MU,P,Q0.M, 
+M,P,Q;M,+M;P,Q,.N © MM,M,M;N x 
=x, MP4+4xM,P,+2¢xH,P, Fic. 10.46 
+2xM,P, 


= 7{1? + (2)? + (2)? + (2)9} 
= εἰς (47 + 57 + 6? + 77) -- 85. 
Similarly, A is less than the sum of the four larger rectangles shown: 
A <B(E)* + (8)? + 0)" + 27} ἐω(δ5 + 6? + 7? + 8%) = ὃς. 
Hence from this particular partition of MN: $3<A <4}. 


We are not very close, but we can get closer by taking a finer parti- 
tion of MN, i.e. by approximating by means of larger numbers of 
thinner rectangles. 

Consider, therefore, a partition of MN into n segments each of 


length h =, The dividing points are: 


1,1+h, 1+2h, 14+3h, ...14+(n—-l1)h, l+nh=2 
and the corresponding heights of the curve are the squares of these 
values. Then, by adding the areas of the n rectangles under the curve: 
A>h{l?+(1+h)?+ (1+ 2h)? +...4+(1+n- 1h)?} 


=m (7) + (,+1) + (+2) +. (j+n-1) | 
=sa{mt + (n+ I+ (n+ 2)... + mI 


A simple algebraic result (proved by induction) is that 
124224324 ...4+-n?=§n(n+1)(2n +1). 
So: 
12+ 224 38+... + (2n — 1)? =§ (2m — 1) (2n) (4n -- 1) =3n (2n — 1) (4n -- 1) 
124.22 4 324....4+(n —1)?=$(n -- 1) (n) (2m -- 1) =§n(n-1)(2n -- 1) 


278 CALCULUS | [10 
ie. n?+(n+1)2+(n4+2)2+... +(2n —1)? 
=n (2n — 1) (4m -- 1) -dn(n—- 1) (2n -- 1) 
=4n(2n —1)(7n — 1). | 


. n(2n—1)(Im—-1) 7/,. 1 1 
Hence: A>———_ a =5(1 mm) (1-7) - 


In the same way, by adding the areas of the n rectangles above the 


curve: 
7 ] 1 


With the n-fold partition, A is found to be contained in the interval: 


[53(᾿ - τιν (' τ τοῦ a(t an) (ae) | 


In retrospect, what we have achieved is a definition of a limit 
process for the area A, the stages being the sequence of finer parti- 
tions as n=1, 2, 3, ... increases. As n->00, 1.6. as the rectangles get 
thinner without bound (h->0), the interval for A converges to 7/3. 
This is our measure of the area MPQN: A=7/3. 


10.5. Integrals. Given a function y=f(x), defined on an interval 
[a, b], where a<b, the integral from a to b is to be defined in such a 
way that it is interpreted graphically as the area between the curve 
y =f (x) and the axis Oz, and between the two lines parallel to the 
axis Oy at x=a and x=b respectively. If f(x) is never negative on 
[a, ol, the area in question is that shown as A in Fig. 10.5a. The 
development of 10.4, which is essentially 
3 y=flx) algebraic and which involves a limit process, 
provides the basis for the definition and 

evaluation of the integral and area. 
We do need, however, to take a closer look 
at the development. The limit process used 
Oz 5 x is a sequence of partitions; the interval is 
Fic. 10.δα split into n equal segments, where n=1, 2, 
3, ... increases without bound. This is not 
quite good enough for a function f(x) of a continuous variable z. 
If the integral or area does exist (as assumed at the beginning of 
10.4), we can find it by any convenient limit process, e.g. the 
sequential process adopted, But, to define an integral and to establish 


5] CALCULUS ~ 279 


that it exists, we must be careful (as in 9.6) to consider any 
limit process and not just a particular sequential process. A strict 
definition, not dependent on a sequence, is given in 15.5. It proceeds 
on the following lines. : 

It is assumed only that y=/(zx) is bounded on the interval [a, δ]. 
A partition p of the interval is a set of dividing points Xp, 2, %, ... Lp 
where a=2, and x,=b; otherwise the points can take any values 
whatever, arranged in ascending order from a to ὃ. The interval is 
divided into segments of lengths: 2, —%, 12 -- 21, V3 — eq, ... Ly — Xn_y- 
These are not limited, either in number (n) or in lengths, except that 
they add to ὃ -- a. Consider the rth segment (r=1, 2, 3, ... ») of length 
4;,--α,... Let x,’ be any point of this segment (x,_,<2,'<2,) and 
write f(x,’) with x,’ varying over the segment. Then f(x,’), being 
bounded, must have a GLB L, and a LUB G, in the segment. Hence: 

Τ,(,--ὐ, ) ΚΞ [(,}(, — Ves) KG (Ly -- ᾧ,. 1) wr rvereveees (1) 

The significance of (1) can be seen in terms of Fig. 10.56, which is 
an enlarged picture of the rth segment. The product f(x,’) (ὦ, — %-_1) 
is the area of the rectangle shown solid in the diagram, the height 
being fixed by the value f(z,’) at the selected point z,’. As x,’ varies 
over the segment, this product varies with the changing height of the 
rectangle. Its lower bound is L,(x,—2,_1), shown by the area of the 
lower rectangle in Fig. 10.56. Its upper bound is G,(x,—2,_,), 
similarly shown as the area of the 
upper rectangle in the figure. The 
‘area’ under the curve, if it can be ene ee nance wens πα 
defined, is also somewhere in this 
range. 

Select a value 2,’ in every seg- 
ment and specify the pair of bounds 
L, and G, for f(z,’) in each case. Oo 
In many cases, LZ, and/or G, will 
coincide with one or both of f (x,_;) Fie. 10.56 
and f(z,), but this is by no means 
necessary; the case illustrated above has L,=/f(z,_,) and G,+¢f(2,). 
Add the products (1) for all segments, r=1, 2, 3, ... n, to give: 


SLi eet, < YF CV ee τρῶς YC (y— Bp adevnennel2) 


a ΧΑ Xp Xr δ * 


280 CALCULUS [10 


in terms of the δ΄ notation for sums. The middle sum in (2), 

n 

LS, (x, a Ls), 
depends on, and varies with, the selection of the z,’s in the various 
segments of a given partition P. The other sums, involving L, and 
G, as fixed for each segment, depend in no way on this selection; they 
depend only on what partition P is given. Write these sums as 

L(P) and G(P) respectively, to indicate this fact. Hence the sum 


Ls (x,)(%, — x,_,) is bounded within the given interval for a given P: 
f= 
#(P)=(L(P), @(P)).- 


This interval is the smallest of all such intervals which can be 
specified. 

We now have a limit process. The stages are given by all the 
various partitions P which can be written. They do not form a 
sequence. On the other hand, we can say what we mean by advancing 
through stages: P,>P, can be taken as implying that the partition 
P, is a refinement of P, in the sense that P, contains all the dividing 
points of P, and some others as well. The value under consideration 


is the sum δ᾽ f(zx,’)(x,—2,_), 1.6. the sum of all rectangular areas of 
r=] 


the kind shown in Fig. 10.5b. This is contained in the smallest 
interval F'(P), given P. It is easily seen that, if P,>P,, then F(P,) 
is an interval contained in F(P,). The question is: in this limit 


process, as we advance through finer and finer partitions P, does the 
interval F(P) converge to a single value or not? If it does, then 


yy f (x,’) (”, —2,_,) has a limit and the limit is what we mean by the 


area A under the curve between a and b. The limit is called the 
integral : 
Dertnition: If f(x) is bounded on [a, b], where a<b, and if the limit 


process F(P) of the sum δ᾽ f(x,')(%,-—%p_1) converges over stages P as 
r=1 


the partition P is refined, then the integral of f(x) from a to ὃ exists: 


{ “fle) de=Lim ¥ f(a,’) (e,-%,1). 
a P r=1 


δ] CALCULUS 281 
If f(x)=>0 on [a, 6], then [ (x) da is the area between the curve y= f (x) 
and the axis Ox and between x=a and x=b (Fig. 10.5a). 

In amplification of this definition, which is in purely algebraic and 
limit terms, it is to be stressed that | Ἵ (x) dx depends on the form of 


the function and on the values assigned to a and ὃ, but not on the 
variable x itself. The variable is ‘integrated out’ over the interval 
[a, δ]. Hence 


[fe de=| fw du=| 7 Apes. 


being always the same area or integral whatever label is given to the 
variable. The value of the integral only changes if a different function 
g replaces f or if a different interval [c, 4] replaces [a, δ]. 


b 
The notation f(a) dx is perhaps not the best which could be 


a 
devised. Indeed, when the integral is viewed as an operator applied 
to the function f(x), a quite different notation is later introduced. 


b 
But [ f(x) ἄπ is the notation in common use and its origin is as 
a 


follows. The sum Σ f (x,')(%, —%,_,) means select x,’ in the segment 
(%, —2%,_1), form the product shown and add for all segments from 
a to b. It is sometimes written s f (x)4x, where = is selected in a seg- 
ment of length 4x around z, the product formed and the sum taken 
for all segments 4x adding to ὃ — a. In the limit 8 [(σ)42.--» | Ἵ (x) da. 


This is only dangerous if the meaningless part ‘f(x) dz’ of the nota- 
tion is separated off to be read as the value of the function f(x) times 
a ‘small increment’ or ‘differential’ dz. 

The evaluation of the integral of particular functions, and the 
establishment of properties of integrals, from the definition is an 
extremely tricky and laborious business. Even if we are prepared to 
assume that the integral exists (which is in fact the case for any 
continuous function, as stated in 10.6 below), we are still left with 
the laborious method of finding the value of the integral as a sequen- 


982 CALCULUS [10 


tial limit, for a partitioning of [a, b] into n equal segments (10.4). 
This is not a practical proposition. We abandon it right away and 
look for something quicker. Fortunately, we find something better 
in 10.7 below. | 

Meanwhile we can write down a few properties and particular 
integrals which can be established without too much trouble from 
the basic definition. First, proceeding on the lines of 10.4 (see 10.9 
Ex. 17), we get: 


b b 
| k dx =k(b -- a) (k constant); { x dx =} (03 —a?)......... (3) 


Next, the following simple but useful properties follow for functions 
f(x) and g(x) which have integrals: 


b b 
| {k f(x)} dx=k | (zx) dx (k constant) 


ὃ b b 
[ ὍΘ) +9 (2) de= { Sle) de + γί de, 


But nothing, as yet, can be offered for the integral of a product (or 
quotient) of two functions. To illustrate, a sketch of the proof of the 
sum property is given: use the same partition P of [a, b] for both f(z) 


and g(x), writing Σ Jf («,')(%,—%,_,) a8 contained in [L,(P), G,(P)] 
r=1 
and Σ σία, γα, --α,. 1) as contained in [1.(6.Ρ), G,(P)]. Then 
r=] 

Σ σα.) +9 (,")} (at, -- ας.) ia contained in the interval [L(P), G(P)} 
r=1 

where L(P)=L,(P)+L,(P) and G(P)=G,(P)+G,(P). Taking the 

b 
limit over stages P, the result for { i («) + 9 (x) dx follows. 


Finally, as a property of a different kind: 
b 6 ὃ 
[fe da= | fle) da | fe) ἀκ (a<c<b) ............ (4) 


To prove this, it is only necessary to concentrate on all those par- 
titions P which include the given ὁ as one of the dividing points; all 
sums then split into two, one part for the interval [a, c] and the other 
for [c, δ], giving the result shown. 


δ) CALCULUS 283 


It is not convenient in practice to confine integrals to intervals 
b 

[a, Ὁ] which are such that a<b. We often need to write | f(x) dx 
a 


even if a=b or a>bd. This is only a matter of adopting appropriate 
conventions: 


[Ὁ 4-τὸ and [ f(v) de= ~ | F(x) de. 


yk 
P(x, y) Q(a,k) 


The first says that the area on an interval of 3» 
zero length is zero. The second convention is 
that an area from right to left, on the interval 
[a, b] which runs from right to left (a>), 15 
equal in numerical value but opposite in signto oO 
the corresponding area on the interval [Ὁ, a] : 
from left to right. 2 a 
Some matters concerned with the use of 
integrals as areas can be cleared up. We need 
to check that, as particular cases, the areas of 
rectangles and triangles given by integrals © M N X 
agree with the elementary notions of these Fia. 10.5c 
areas, Consider the rectangle OLQN, with sides 
aand k, shown in Fig. 10.5c. This is the area under the line y =k, from 


a 
x=Otoxr=a: | k dx =ka by (3), i.e. area is base x height as required. 
0 


Consider, further, the triangle OQN of Fig. 10.5c, with base a and 
height b. This is the area under the line y - z from x =0 ἴο x =a. For, 
if P(x, y) is a point on the line, then: 

a ein Pe ξοῦ 

z ΟΝ ON α * a" 


Hence, area = | 2 de= “ὦ dx = Ὁ (a2) = sab by (3). 
a a} 9 a 


0 
Again, the area is as required: } base x height. 


b n 
The integral and area | f(x) dx is the limit of δ᾽ f(x,’)(x,-—2a,_1), ἃ 
. a r=] 


purely algebraic concept. An important matter of signs has to be 
K2 A.B.M. 


284 CALCULUS [10 


considered. If f(~)=>0 everywhere over the interval [a, Ὁ], then the 
b 
sum, and its limit the integral, is essentially positive. { f(x) dx is the 


area A of Fig. 10.5a. But, if f(7)<0 anywhere over the interval, 
then there are negative as well as positive terms in the sum 


VS (x-') (ὦ, --α,..) and the algebraic sum (and hence the integral 
r=1 


as limit) is the net balance of 
positive and negative terms. The 
position can be represented as 
in Fig. 10.5d. Suppose f(7%)<0 
everywhere over [a, b]. Then the 
sum consists entirely of negative 
terms and the area (of numerical 
value B) is entirely below Oz: 


b 
B= -| J (x) dz. However, sup- 


pose the curve y=f(x) crosses 

' Ox at x=c, such that f(x)>0 for 

C-D= 70) ἀν axu<c,f(x)=O0atu=c, f(x)<0 

Fig. 10.5d for c<2<b. Then the sum con- 

sists partly of positive and partly 

of negative terms and the integral is the net balance between the 

areas C’ above Ox and D below Ox shown in the diagram. To get the 

numerical values (C and D) of the areas, we split the interval into 
two parts at c and we use the result (4) above: 


C= | Vide. and Dau | Vode 
i.e. the total siiveiseieal area, is: 
{ “f (a) da - { f(a) dn =C + D 
whereas the total eo is: ) 
| Ste) dx = [ 7ω dae + | 70) dx =O -D. 


In using integrals, particularly with reference to the evaluation of 
areas, we must make allowance for the fact that they are algebraic, 
and not numerical, sums. 


6] CALCULUS 285 


10.6. The fundamental theorem of the calculus. The theorem now to 
be stated and applied is one of the most extraordinary in all mathe- 
matics. It is simple enough to state and it says two things. One is that 
the integral of a continuous function always exists, a result which is, 
to say the least, extremely convenient and useful. The other is that 
integration is the inverse operation to derivation. It is this which 
gives the theorem its great power; it goes right down to the funda- 
mentals of the calculus. 

Some preliminary considerations will help us to appreciate this 
powerful theorem. First we need to get an integral into the form of a 


b 
function of x. So far, the integral and area are written [ f (x) da, 
. a 


depending on a and ὃ. Write: 


F (x)= { MO Ti canoe (1) 


where the variable of integration (which is a matter only of labelling) 
is written as uw to prevent any ambiguity with the variable x, which 
now stands for the upper end of the interval [a, x] over which the 
integral is obtained. Then /'(x) is an integral as a function of 2, as 
required. We must remember, however, that the value of a, the 
lower or fixed end of the interval [a, x] used, also enters into F(z). 
Suppose that we have two functions, F(x) and f(x), related in such 
a way that F’(x)=f(x). In other words, f(x) is the derivative of 
F (x) and (if it exists at all) f(x) is uniquely obtained from F(x). The 
curve y=’ (x) has a tangent at x with slope equal to the height of 
the curve y=/(x). Or, what comes to the same thing, F (x) is an anti- 
derivative of f(x) and, as such, is subject to the addition of an arbi- 
trary constant (10.3). The suggestion of the example (i) of 10.1 is that 
the area under the curve y=/(z) on the interval [a, 2] is equal to the 


height of the curve y= F (x) at x. If it is true, then [1 (uw) du=F (x) 


where ΜΚ’ (x)=f(a), i.e. integration is the inverse of derivation and 
the integral (1) is the same as an anti-derivative F(x) of f(x). To 
establish this relation is the job of the Fundamental Theorem of the 
Calculus. ᾿ | 

There are, then, some further properties which we know or expect 
to hold. In derivation, we pass from F (x) to f(z) where F” (x) =f(z). 


286 CALCULUS [10 


We know we cannot count on getting f(x) from F(z), even if F(z) is 
continuous. We must keep an eye open for the odd case where a 
continuous F'(x) fails to have a derivative, i.e. where the curve 
y= (x) has one or more sharp points. In integration, we are pro- 
ceeding the other way, from f(z) to its integral or anti-derivative 
F (x). Here we expect that, if f(x) is continuous, then F(z) exists. 
Our expectation is based on intuition; it seems right that an area 
exists under a curve which has no discontinuities. Again, it is the job 
of the Fundamental Theorem to establish that our intuitive con- 
clusion is correct. 

Further, any additive constant disappears in derivation. If 
Μ΄ (a~)=f(z), then G(x)=F(x)+A has the same derivative: 
G’ (x) =f (x). The inverse process of finding an anti-derivative F (x) 
of f(x) is always subject to the re-introduction of an arbitrary 
constant. We can only say that F(x) is an anti-derivative; the 
general form of the anti-derivative includes an additive constant. 
When the integral (1) is associated with the anti-derivative of f(z), 
the question of how to incorporate the additive constant remains. 
The Fundamental Theorem shows how this is related to the fixing of 
the lower end a of the interval [a, x] used in (1). 


FUNDAMENTAL THEOREM OF THE CALCULUS: If f(x) is defined and 
z 
continuous on the interval [a, 6], then at each x of [a, 6]: F (x)= { f(u)du 


exists, 1s continuous and has derivative F’ (x) =f (x). 
CoroLuary: If f(x) 1s continuous with anti-derivative F (x) at each 


b 
x of [α, b], then: [fe dx = F (δ) — F(a). 
The proof of the theorem is difficult. It is set out in 15.5 but omitted 
here. The corollary follows easily: by the theorem, | J(u) du and 


F (x) have the same derivative f(x) and so differ only by a constant. 
Hence: 


{fe du=F (x) +A. 
Put v=a: 0O=F(a)+A 1.6. A= -- F(a). 
Put v=): [1 du=F (b)+A=F (ὃ) -- F(a). 


6] CALCULUS .287 
Hence: [fe dx = F (ὃ) -- F(a). Q.E.D. 


A convenient exposition and a suitable notation for integration 
as inverse derivation can now be set out. Given only that f(x) is 
defined and continuous on [a, Ὁ], the Fundamental Theorem states 
that an anti-derivative of f(x) exists. Let F (x) be any form of it, i.e. 
any function found to be such that F’ (x)=f(x). Then the general 
form is G(x)=F(x)+A, where A is some constant. The integral of 
f(x) can be written in three ways in terms of F(x) where F" (x) =f (x): 


Firstly: [ (7) dx = (1 (δ) -- F (a) 
-  --— + τσ ὑπο ιλου κεν δ: (2) 
ΟΥ̓ [7 (x) dx =F (δ) -- F (a) 


The arbitrary element does not appear in this form; it goes when the 
interval [a, ὃ], over which the integral (2) is written, is fixed. For, if 
G (x) = F(x) + A, then: 

G (Ὁ) -G (a) ={F (δ) + ΑἹ -- {1} (a) +A} =F (ὃ) -- F(a). 


The form (2) holds whatever anti-derivative F is taken. 


Secondly : { Ἴω du =F (x) -- F(a) 


or { “' (w) du= ἢ (x) -- F(a) 


Allowance is again made for the arbitrary element, in the specification 
of the fixed end a of the interval [a, x], over which the integral (3) is 
written. The arbitrary constant appears as A= -- F(a), i.e. it is a 
which is arbitrary in (3). 

Thirdly: {f(x) dxv=F (x)+A 
or ff’ (x) dx=F(x)+A 
can be used as a convenient notation. It implies that the integral is 
written over some interval [a, x] where x is the variable of the function 
F(x) and where a is arbitrary and absorbed into the constant 
A= -- F(a). 

Of these notations, (2) is called the definite integral and (4) is the 
indefinite integral. The link between them is (3) which shows how the 
arbitrary element is switched from A to a by means of A= — F(a). 


} (A arbitrary constant) ......... (4) 


288 CALCULUS [10 


In the indefinite integral, there is no function of x. The value 
F (δ) -- F(a) shows the dependence of the integral on the interval 
[α, δ] over which it is taken. In the indefinite integral, the emphasis 
is on the function F(x). The variable x is the upper end of the inter- 
val [a,x] over which the integral is taken. There is an additive 
constant A to allow for an arbitrary lower end a of the interval. 


10.7. Integration in practice. The position is that we have abandoned, 
as too laborious in practice, the evaluation of the definite integral 


b 
f(x) dx as a limit. Instead we concentrate on the indefinite in- 
a 


tegral [f(x)dx=F(x)+A, where F(x) is some anti-derivative of 
f(z) and where A is an arbitrary constant. We seek a function F (2) 
such that F’ (x) =f(x), knowing at least that it must exist if f(x) is 
continuous. Once F(x) is found, there is no difficulty whatever in 
evaluating a definite integral of f(x): 


If { f(x) da=F(x)+A, then | fo da—| F()| =F 0) - Τὼ 


and the arbitrary constant disappears. 

Note that we seek an anti-derivative not the anti-derivative. There 
are various anti-derivatives and one differs from another by a 
constant. This is not quite as trivial as it might seem. For example: 


D{x (a + 2)} =D (a? + 2x) =D (x) + 2D (x) = 24 +2 =2 (2 +1) 
1.6. f2(v@+1)dv=x(4+2)+A 


and an anti-derivative of 2(x+1) is x(v+2). We could equally well 
have found (x + 1)? or (x -- 1)(5 - 8). Though these look different, they 
do in fact differ from x(x +2) only by a constant. We should never 
be surprised to find two apparently different anti-derivatives of a 
given function on offer; they can be perfectly respectable in that they 
differ by a constant. 

Viewed in this way, integration is just reverse derivation. However, 
we shall see that the rules of the game are a little more difficult for 
integration. The set of operational rules for derivatives (10.3) do not 
provide so tidy a set of rules for integrals. Essentially, integration is 
a hit-or-miss affair: just find somehow an anti-derivative F (x) such 
that F’ (x) =f (x), add an arbitrary constant for the indefinite integral 


7] CALCULUS 289 


b 
S f(x) dx, write F (Ὁ) -- F (a) for the definite integral f(x) dz, and we 


are done. As practical guides, without expecting too much, we can 
see what we can do in inverting the rules and standard forms for 
derivatives. 

In the following development, the indefinite integrals shown are 
assumed to exist. They are written without the additive constant for 
convenience; but it. must always be borne in mind. Standard forms 
for derivatives can always be inverted to give standard forms for 
integrals. The one we need immediately, for algebraic functions, is 

gti (r+ 1).7 | 
r+ i) 1 


the inversion of D( 


d 1. tional, γΞ --Ἰ 1 
xv’ ax= ry vratlonal, 7 — 1}... ον ὁ ccc cece cece nes ον ens 
[ r+l1 ( ) (1) 


Of the rules, we can easily handle those for a sum, a difference, a 
multiplicative constant and an additive constant. If u and v are two 
continuous functions: 
f(utv)da=fudutfod«e matching D(wtv)=Dux Dr; 
fkudx=kfudz matching D (ku) =kDu 
and f(u+A)dr=Judzx+ Ax 
since f(w+A)dx=fudx+Afldx=fudx+Axz by the sum and 
multiplicative constant rules and by uses of (1) with r=0. 

The rule for the derivative of a product is itself not a very simple 
one: D(uv)=uDv + vDu. It is not possible to invert it into a corre- 
sponding form for integrals. The nearest we can get is a formula, 
much used in practical integration, called ‘integration by parts’. 
Given two functions u and v with derivatives μ΄ and v’, start from 
the property (4) of 10.6: 

uv =D (uv) dx =f (uv’ + vu’) dx=fur' dx + fou’ da. 
Hence, the formula for integration by parts: 
Jrrn! dae στ: (2) 
An alternative expression is obtained by writing u=f(x), μ΄ =f" (2), 
υ' =g(x) and v=fg(x) dx. So: 
Sf (x)g (x) dx=f (x) fg (x) dx — {{{ (x)fg (x) da} dx .........00 (3) 


The formula (2) or (3) does not serve directly to integrate a product 


200 CALCULUS [10 


of two functions. It simply passes the buck. The hope, quite often 
realised in practice with ingenious selection of u and v, is that the 
integrals on the right-hand side of (2) or (3) are easier to evaluate. 
An example illustrates: | 


gitl 9 
From (1), we have [ve dx = [= dx =;— ΤῊ = 3t./x 
τ —t+1 
and [2 dx = i dx =— τι - να. 
Write f(z) ΞΞ1 - 85 and g(x) πεῖς in (3): 


[sz ae =(14+30) [ςτὸ dx — [- (1 + 32) «|p az} dx 
= (1+ 32)3(2./x) -- [(8 x 4(2/x)} dx 
=(1+ 32) μ«α -- ὃ [μα dx 
=(1+32)/x -- 332,/x 
=(1+2),/x. 


As a check: D{(1 +2),/x} =——— or as shown in 10.3. 


The other rule of derivation, of frequent use in practice, is that for 

a composite function. If y is a function of u and wu a function of z: 
Diy =D,yD wu. 
Something can be done with this to give a practical rule for integra- 
tion known as ‘integration by substitution’. Let y=f(u), where 
u=g (zx) with derivative μ΄ =g’ (x). Write: 
Sf(u) du=F (u) = F{g(x)} 
treating it either as a function of uw or as a composite function of z. 
As a function of u, F’ (u)=f(u). As a function of x, writing D=D, 
for derivatives with respect to x, we have: 
DF {g(x)}=D,F(u)Du by the composite function rule 
=f(u)ju’ where u=g(x), μ΄ =g’ (x) 

1.0. F{g(x)}=ff(u)u’ da. 
Hence the formula for integration by substitution : 


SF (w) du=Jf (uu  ......{νννννννννννννννννον (4) 


7] | CALCULUS 291. 
where u is a function of x with derivative μ΄. More fully: 


[f(w) du =f fig (a)}9" (@) da cecssesceseseseee. (5) 


on making the substitution «=g (x). The use of (4) or (5) is that, to 
evaluate the integral of f(u), we switch variables from u to x by 
substituting u=g(x). Again, this is a successful passing of the buck 
only if the integral obtained on the right-hand side of (4) or (5) is 
easily evaluated. This hope is often realised by appropriate choice of 
g(x). The same example illustrates: 


eee du eee 2a dx on substituting u=2?, wu’ -- 2 
2,/U 22 


a + 32?) da= [1 da+3| at de=e +a by (1). 


1+3u 
2/u 
In integration, there may be more than one way of skinning a cat. 

The rules for integration can be brought together. Here wu and v 
are continuous functions with integrals. To ensure that the integrals 
exist in integration by parts, the derivatives u’ and v’ are also taken 
as continuous. In integration by substitution, f(u) and u’=g’ (x) are 
both taken as continuous. 


Substituting back u=2?: { du=(1+u)/u as before. 


Rule Indefinite Integral* 
Additive constant f(w+A)dz=fudz+Azxzx (A constant) | 
Sum f(utv) ἄχ τε [ἄχ Ἐν ἀκ 
Difference f(u-v) dx=fudx-fudz 
Multiplicative constant fkudza=kfudzx (& constant) 
Integration by parts J uv’ dx τεῦ —f vu' dz 


Integration by substitution | ff(u)du=Jf(u)u'dx where u=g(zx) | 
a τς ey ee το τὸ ee el 


Finally, to stress the connection between indefinite and definite 
integrals, consider the particular example already used: 


(Sz Se dx =(1-+2),/z. 


In writing an indefinite integral such as this, we indicate that we 
have a function of a variable x by using x after the f sign. If we change 


* An arbitrary constant is to be added in each case. 


292 CALCULUS [10 


the variable after the { sign, we keep the same function but change 
the independent variable to which it relates. So: 


[Se dx=(1+),/x and [Ξ du=(1 +U),/U 


the first a function of x, the second a function (the same one) of w. 
For a definite integral, the ‘variable’ used after the f sign does not 
imply a function of this ‘variable’. It is a ‘dummy variable’, to be 
labelled in any way convenient. For, the definite integral involves 
only the ends of the interval [a, ὃ] over which it is taken; it 1s not a 
function of a variable x. So: 


b ὃ 
[fe de =| fw du=...=F(b)— F(a) 


where F is an anti-derivative of f. In the particular case: 


[ieteae[a sone} [ose], [ore] τινι 


414 3x 
1.6. dx=8 
[. 2,/% 


and the same value is obtained if we start with 


414-34 


ou du. 


10.8. Derivatives and integrals as operators. When introducing the 
derivative notation (10.2), we remarked that ‘D’ in DF (x) could be 
taken as an operator and read ‘get the derivative of’. We have a 
transformation, changing a given function into another function, 
i.e. the derived function. This follows the usage of 6.4. Some questions 
immediately suggest themselves. Can integration also be written in 
operator form? If so, is the operator the inverse of D? Can a whole 
group of such operators be written? 

In pursuing these matters, we ignore for the moment the arbitrary 
constants which arise in integration. For some continuous f(z), 
write {f(x) dx =F (x), meaning DF (x) =f (xz). Let the transformation 
from f(x) to an integral or anti-derivative F (x) be written in operator 
form: Ef(x)=F (x). Here ‘HE’ is read ‘integrate’ or ‘get an anti- 
derivative of’; Ef(x)=F (x) means that getting an anti-derivative 
of f(x) produces F' (x). Then # and D are easily related: 


If Ef (x) = F (x), so that DF (x) =f (x), then D{Ef (x)} = DF (x) =f (=). 


8] CALCULUS 293 


In the usual notation for transformations, the product DE is written 
for successive applications of the operators, H first and D second. So: 
DEf (x)=f(x). Further, H{DF («)} =E£f (x) =F (x),i.e. EDF (x) =F (2) 
where ED is the succession of D first and # second. Introduce I for 
the identity operator, leaving a function unchanged: If (x)=f(z). 
We now have three operators to apply one after the other and they 
are related: DE=ED=I. Hence D and FE are commutative in 
multiplication, interpreted as successive application, and one is the 
inverse of the other. We can write Καὶ as 1) 1, the inverse of D: 
DD1A=D'D=1. 

We now have a new notation for integration, in many ways more 
convenient than the old: D~1=f... dx. So: { f(x) de=D~'f(2). 

The idea of successive application of the operators D and D™! can 
be pursued further. Suppose F(x) has derivative F’ (x) and suppose 
that this derived function has a derivative in its turn. This is the 
second derivative of F (x), written 1 (x). If F’ (x) is interpreted as the 

‘velocity’ of F(x), then Κ΄ (x) is the ‘acceleration’ (see 10.9 Ex. 7). 
In operator form: 

Μ΄ (x) =DF"' (x) =DDF (x)= D*F (2), 
If each derived function always has a derivative, the process con- 
tinues: . | 
Derinition: If F(x) has derivative F’ (x), if F’ (x) has derivative 
Μ΄ (x), ... for n stages (n a positive integer), then there 1s an nth 
derivative : 

D*F (x) = F(z). 

Suppose f(x) is continuous so that {/(x) dx exists and is continuous. 
Then {f(x) da has an integral, the second integral of f(x), written 
SS f(x) da dx: 

SS. ew) dar dee = D-14 f(a) de =D™1D-¥f (n) = D-¥f (2). 
Write F(x)=ff(z)dx and G(x)=f[F (x) dx=fff(x)dadxz. Then 
DF (x)=f(x) and DG (x)=F (x). Hence D?G'(x)=DF (x) =f(x). But 
G (x) =D~*f (x). Hence D-*f(x)=G (x) implies 1256 (x)=f(z), i.e. 
D*D-*f(x)=f(x) and 2). 3256 (“)-Ξ 6 (2). 
So D-? and D? are inverse: D?D~2=D~?D? =I. 

Since f(x) is continuous, the process of successive integration con- 

tinues : 


294 CALCULUS [10 
Derinition: If f(x) ἐδ continuous, the nth integral 
D-"f(a)=ff ... ff(v) dada... dx 


exists for any positive integer n, and 1)" is the inverse of D*. 
As obvious conventions, write D° = J and D1= D. A general operator 
1)" is obtained for any integer n, positive, zero or negative: 


re) eae Dae Gan 3 ay 5 | eens 


as a set of operators or transformations. If » is positive, then 
Dn =D x Dx... x D (n times) for the nth derivative. If n is negative, 
write it —m so that D™=D~1x Dx... x D-! (m times) for the 
mth integral. Further, the operators can be mixed, and are com- 
mutative, in successive applications (10.9 Ex. 29). In general: 


Dn Ppr=D"+" (m and n integers). 


The set of operators is a group under multiplication, with an identity 
J =D° and with every member of the set having an inverse: D"D-* = I. 
There is a one-one correspondence between the set D* and the 
integers n. The group 125 under multiplication is isomorphic with the 
group of integers under addition. 

There are now qualifications to be noted. Run through the 
sequence of higher and higher derivatives of a given function F(z): 


F(x), DF (x), D?F(x),... or F(x), F’ (x), F’' (2), .... 


This can only be done if the function at each stage does have a 
derivative. The sequence, in fact, may be halted at any stage for lack 
of a derivative. To write the sequence up to a stage n, we must 
assume (or ensure) that the first n derivatives of F(x) exist. On the 
other hand, given a continuous function f(z), we can always write 
the sequence: 


f(x), Df (x), D-*f (x), ... δὲ T(x), Sf (x) dx, ff f(x) dadz, .... 


The difficulty here is a different one. At each stage, an arbitrary 
constant is introduced so that, by the nth stage, there are n of them: 


D-¥f (x) =f f(x) de +A, 
D-*f (a) =f f(x) ἀ du+A,x+A, 


D-f (x) =fff f(a) dx dx da +A, T+ Ag+ Ag 


8, 9] CALCULUS 295 


As a simple example, take the function x?. Successive derivatives do 

exist: | 
| x3; Du? =2e; D*x?=2; D8x? = D4x? =... =0. 

Successive integrals can always be written but they involve a 

mounting set of additive constants: 


x: De +A,; D- ta +A,7+A,; 


D-igt= Θ΄ +A, T+ Ag+ Ay} a 


In using the operator form, we can omit the arbitrary constants, e.g. 


5 
1) 3.3 -ς τ as long as we remember the qualification that they 


need to be inserted to get from an integral to the general integral. 


10.9. Exercises. 

1. A body travels z = 100¢ — 3¢* feet in ¢ minutes. Five minutes have elapsed ; 
find the average speed over 1 minute more, over 0-1 minute, over 0-01 minute. 
Write the average speed over h minutes and deduce the velocity after 5 
minutes. Generally, show v = 100 -- ¢? and deduce that the formula for x applies 
to a body starting off at 100 feet per minute, with decreasing velocity, coming 
to rest after 10 minutes. 


2. As in 9.9 Ex. 5, show that y= 


2_ 72 
(5 16}} ο΄, κα (x 0) and y—2a as 


x—>Q. Interpret as the derivative of y =z? at x =a. 
3. Show that f(x) =1 —./{(1 —x)?} is continuous at 2 =1. Show that 


(mf SO) 


is + i (hk negative) or —1 (h positive). es that the limit process for ¢(h) 
as h—0 is F (N) =[ -1, 1] for all neighbourhoods N, i.e. that ¢(h) has no limit. 
Represent f(x) graphically as a ‘curve’, with a sharp point where the function 
is continuous but without derivative. Show that f’(x)=1 for x<1, f’(x) not 
defined at x =1 and f’(z) = -- 1 for z>1. 

4, Tangent and normal. From (3) of 8.6, show that the tangent at P (x,, y,) 
to the curve y=f(z) has equation y -- ψ, =f’ (%,)(x —x,) if the tangent exists, 
and that the line perpendicular to the tangent at P (called the normal at P) is 
(2 —a,) +f’ (%1)(y -- ψι) =0. Write the equation of the tangent and normal to the 
parabola y =x? at the point where z =2,. | 

5. Given D(x) =1 and D(z") =nz"-1, use the product rule to establish that 
D (απ }})} =(n +1)a", Hence prove D(x")=nzx"-! by mathematical eames 


: ] 1 ] 
(n & positive integer). From D () meee show D (3) =- 
composite function rule. 


mi aii by the 


206 CALCULUS [10 


6. Itis given that D(x") =nx"—', n a positive integer. Take p and q as positive 
integers. Show that D(%/x)=1/q <(/(x*1) by the inverse function rule. Use 
the composite function rule to deduce D(%/x?). 

7. Second derivative as an acceleration. Distance travelled in time ὁ is given 
as x =f(t) so that velocity v =f’ (ὁ). Show that f’ (¢) is the limit of the average 
rate of change of v at time ¢ and interpret as acceleration. Show that the 
acceleration is constant for the motion of (i) of 10.1 and that it is negative 
(deceleration) in Ex. 1. 


8. Obtain <(-) from first principles by writing fest) io) == ib 
for f(x) =- 

9. Establish from the definition the quotient rule for derivatives. 

10. Show that the derivatives of 1 τὰ 3; ana et vent, and Vi = 
are respectively: 1 +22; πα τη Θεθις τοι, and ers Show 
also that νὰ + a and Nes. are both anti-derivatives of 4} aii Check by 


νὰ /x σα 
showing that they differ by a constant. 

11. A line has equation y =mz +c. Show that Dy =m, D*y =0; and interpret 
the slope of the line as the slope of a tangent. Conversely, given D*y =0, write 
anti-derivatives to obtain y=mz +c, i.e. to get y=ma-+c as the solution of 
D*y =0. 

12. If y=2a/(1 -- αΞ show that y, Dy and D*y are all positive in the open 
interval (0<x<1) and hence that y increases at an increasing rate from y =0 
at « —0. What can be said when x—1? Show also that, if y =1 + (2. + 8)(53 -- 1), 
then Dy is zero at x = — 2-62 and x = — 0-38 (approximately), positive between 
these values and negative elsewhere. (See Fig. 3.9.) 

13. A freely-falling body. A body falls freely from rest; Galileo’s law is that 
x =1gt? (g the gravitational constant) is the distance travelled in time ¢. Show 
that velocity v increases steadily, with constant acceleration g. Conversely, if 
the acceleration is given as a constant g, show that v =u - σέ and x =ut + 3gt?, 
where w is the initial velocity (at t=0, x =0). Further, if the body is thrown 
upwards with velocity wu, show that the upward velocity v and the distance x 
travelled upwards after time ¢ are v =u — gt, x =ut -- ἐσύ; and deduce that the 
greatest height obtained is w?/2g. 

*14, A ball is thrown into the air with velocity u at an angle «° to the 
horizontal. Show that horizontally (with no acceleration) the distance 
travelled is x =wut cos «, and vertically (with downward acceleration g) it is 
y =ut sin « — 3gt*, after time ¢. Deduce that the ball returns to the ground a 


eee ‘ : ; : 
distance x =— sin 2« from the starting point (sin 2α =2 sin « cos a), and that a. 


maximum distance is achieved if the ball is thrown at 45°. Show that the equa- - 


9] CALCULUS 297 


tion of the path of the ball through the air is obtained (by eliminating ἐ) as 
eet Seer 
2u? cos? a 

15. Marginal revenue. The demand for the product of a firm (« thousand 
items per week) at specified prices (£p per item) is given by x +ip=1. Write 
R (in £ thousand) as the total revenue obtained by selling x (thousand items) 


y =x tan « — x*, which is a parabola. 


and show that με =2(1 -- 22). Interpret this as marginal revenue; what units 


does it appear in? Put =0, achieved at output of 500 items per week, and 
interpret graphically. 

16. Maximum profits. In the firm of Ex. 15, the total cost C of output x 
(thousand items) is C = 2x? (£ thousand), and marginal cost is x Ξε 4, Write 
IT=R --Ο for profits at output “ and show that J] is greatest at the output 
(250 items per week) where marginal cost = marginal revenue. 

17. Establish the following areas (definite integrals) as limits of sequences 
as in 10.4: 


[id ΞΕ dx =3; "at 4--Ἶ; [29 dx τῷ 
1 c= > i” τ 9? 1 =e? 1 σ 4° 


Generalise by using the standard form for far dx: 


b b b b 
fi dx=b-a; : da =4(b? —a?); [iz dx -- (δ -- a); [ae dx =1(b4 —a*). 
18. For y=1 —./(1 - 2)? defined on 0<a<2 (see Ex. 3), obtain 
2 1 2 
[ν dx =|"y dx + [ν αἀχ ΞΞ} 


and check graphically by means of areas of triangles. 
19. If y =x? — 8.3 + 2a is represented by a curve, as in Fig. 10.5d, interpret 


[ν dx =0 in terms of areas under the curve. If C is the area above Ox from 0 to 1 
0 


and D the area under Ox from 1 to 2, show that C=D =}. 


= ~] I 
20. From ar =g—/2 _ y—3/2, show that ΓΞ dz = a( va + =) + constant. 
Check from Ex. 10. 
Be tt ee adx  , ; 
21. Make the substitution u=1-—2? to show that [i ai FT za + con 


2 
Gane a8 apart from the additive constant (by Ex. 10); is 
this consistent? 
22. By integration by substitution with u =z -- 1, show that 


stant. But [ 


d ign 
SNe -ἰ ἀκ --Ξ(α -- )Ὼ 5 and [Ja =2Ne —1 (+ constant in each case). 


298 ' CALCULUS [10 
Then, by integration by parts, show ai 


x dx 
ἐ[ΞΞ πονσττ -l-  V@nip 1) + constant. 


1 
*23. Show that [om (1 —x)""1 dx =|a4(1 —x)™-1 dx where m and n are 


positive integers. (Substitute u =1-—2.) Further, show that 
D{x™ (1 —2)"} =max™-1(1 -2)"-1 —(m+n)x™(1 -- α)ρτι 


femal —2)"—! dx -- : 
mM + 


and hence that fem ~x)"1dzr= le “ἘΠ -- χα)", 
M+n n 


Deduce that Ν e™ (1-2) dx= = 
m + 


1 
Ϊ e™-1(1 —x)"™—! dx 
n JO 


and [fora —x)™ dx= = [orn —2)™— dz 
0 m+n 40 : 


k 
24. Convergent infinite integrals. If f(x) is continuous x >a and if | St (x) dx 
a 


oO 
has limit Z as k—-oo, define the infinite integral [ J (x) dx =L and say that it is 
convergent. Interpret the convergent infinite integral in terms of areas under 
the curve y =f(x). Show that ie σας Ξ- and illustrate graphically. 


96. F Ex. 21, show that _—- 1 
ὃ rom xX. > 8 "ἢ -- a)? jh? — 1 ~ i _ 


x dx ᾿ acca = 
" Π -φὴϑ Py is convergent (with value 8) but that neither [, (1-2): 


nor εξ: can be written. (‘The integral is not convergent as h-+1.) 
*26. Beta functions. If m and n are positive integers, write 
B(m, n) ={am1(1 —2x)"—1 dz 
and use the results of Ex. 23 to establish that: 
B(m,n)=B(n,m) and B(m+1,n) =—B(m, n). 


=) for k>h>1. 


Deduce that Ν 


] 
Deduce that B(1, 1)=1, B(m, 1) = B(1, n) ="; 


(m—-1)!(n-1)! - 
(m+n —1)! 
(The Beta function can be defined similarly for m and n any positive real 
values.) | 
* 27. Transformation of integrals. The formula for integration by sub- 
stitution can be regarded as the transformation of one integral into another. 


and that B(m, n) = 


(m>1,n>1). 


Define B(m, ἢ) as the integral of Ex. 26 and transform by x= in . Show first 


m—1 l-h 
dy where k =—-—->0o as h-0. Deduce 


1 k 
that Ϊ ἔοι (1 -—x)"—"! dx =| i; 


o(l+y)" 
nl 


that B(m, n) = -|- Ξ 


o(+a)™ \min ἐπ as a convergent infinite integral. 


97 CALCULUS 299 
*28. Integration by parts for infinite integrals can be written: 
@ 
| uv’ dx = [ wo | Pie [ξυω dx where [ uo |” = Lim (uv) -- | wo | , 
a a a a a—~>00 a 
Illustrate its application by showing that: 


o gil 1 x™1>2 χη [Ὁ ah d 
vc 0 (1 +ayminty % 


0(l+a)mn" = (l+ax)"™+" n Jo n 
; a) an n \; grt 
1.6. that εἰ i. (l+ay)minsi dx mtn 0 (l+a)™4" . 
Deduce that Bim, n +1) ga B(m, n). 
m+n 


Use the symmetry: B(m, n)=B(n, m), to show that this is the same as 
m 
B = ; 
(m +1, n) 7 B(m, n) 


29. From the definition of the operator D, show that 
DDD“ f (x) = DD Df (x) = D-DD f (x) = Df (x). 
Examine other such combinations, generalise and show that 
ΤΡ (x) =D™ (x) 

for any integral m and n. 

30. Leibniz’s Theorem. If the functions u and v have derivatives of any desired 
order, use D(uv)=uDv+vDu to show that D?(uv) =uD% +2DuDv + vD*u. 
Hence prove by mathematical induction, for any positive integer n: 


D" (wv) =uD"v + (7) DuD™1y + (5) Dubey tit (7) "ude +oD"u, 


CHAPTER 11 


EXPANSIONS 


11.1. Taylor’s series. A function f(x) of a real variable is defined over 
the interval [a, 6]. The present development aims at the derivation 
of an expansion of f(x), approximately as a polynomial in ascending 
powers of x of degree n, exactly as an infinite series of such powers. 
The basic theorem, the Mean Value Theorem for derivatives, requires 
only that f(x) is continuous over the whole interval (for a<v<6b) 
and that f’(x) exists within the interval (for a<2<6). It is not 
necessary to assume that Κ΄ (x) exists at x=a and at x=b (though it 
usually does).* Nor is it necessary to assume that f’ (7) is continuous 
(though it usually is). The theorem, reached in two stages, makes 
use of a theorem on continuity (9.8). Since f(x) is continuous over 
[a, δ], its range is an interval [c, d], every value being achieved 
for some x of [a, δ]. In particular, f(x) is bounded and attains its 
GLB ὁ and its LUB d in the interval. 
We begin with what is generally known as Rolle’s Theorem: 


ΤΉΒΟΒΕΜ: If f(x) is continuous a<x<b and if f' (x) eaists ἃ «ὦ <b, 
then f(a)=f(b) implies that « exists (a<a<b) so that f’ (a) =0. 
The implication of this result is seen 
clearly in graphical terms. In a graph of 
the curve y=f(z) referred to axes Oxy 
(Fig. 11.1a), let A and B be two points 
at the same height on the curve. Then 
there is some intermediate point P at 
which the tangent is horizontal: f’ («) =0. 
Fic. ll.la P may not be unique; there can be two 
(or more) such points, P and P’ as illus- 
trated. The theorem states only that there is at least one such point. 
The proof is as follows: 


* If the wider assumption is made that f’(x) exists everywhere over the interval 
(a<x<b), then f(x) is necessarily continuous over the interval. 


1] EXPANSIONS 301 
Write φ()- [() -7)  sothat φ' (x) =f" (x) 

Hence: (a) =4(b) =0 given f(a) =f (0). 

If d(x) =0 throughout [a, δ], then ¢’ (x) =0 everywhere, i.e. f’ (x) =0 
everywhere and there is nothing more to prove. On the other hand, 
if ¢(x)40 at some points, there are positive values of ¢(x), or 
negative values, or both. Suppose ¢(x)>0 somewhere in [a, ὃ]. Then 
the range [c, d] of d(x) has d>0, and there is some α (a <« <6) giving 
the LUB ¢(«)=d>0. It can then be shown that ¢’(a«)=0. For, 


if p’ (a) >9, i (α -- ἢ) —¢(«)} converges to a positive limit as h->0 


and εἰ (a+h) —¢(«)}>0 for some sufficiently small positive h. This 
means that ¢(«+h)>¢(«) which is ruled out since ¢(«) is the LUB. 
Equally, if φ' («) <0, : {h(a +h) —¢(«)}<0 for some sufficiently small 


negative h. This means that ¢(«+h)>¢(«) again, and the case is ruled 
out. Hence ¢’(«)=0. The same result follows if ¢(7) <0 somewhere 
in [a,b]. Hence, in all cases, ¢’(a)=0, 1.6. f'(a)=0 for some 
a (a<a<b). Q.E.D. 
The Mean Value Theorem for ΠΤ then follows: 
THEoreEm: If f(x) is continuous axa<b and if Κ΄ (x) exists a<au<b, 


_ then « exists (a <a <b) so that f’ («)= Ξ, fe) Fa) f (4) 
The meaning of the result in ee 
terms is clear from Fig. 11.1b. If A and 
B are any two points on the curve 
y =f (x), then there is some point P (per- — 
haps several such) at which the tangent | 
is parallel to the chord AB. The slope of 0 
the tangent is f’ («) where x = « corresponds 


to P. The slope of the chord is: 
QB NB-NQ  NB-MA (0) -- [(α) 


AQ’. MN  °} ON-OM b-a 


Rolle’s Theorem, a particular case, is used to prove the more general 


᾿ 


& 
Qh------- 
o 


Fia. 11.16 


case: 


Write φ)- 7) ~f(a) - (ea) L-L9) 


902 ἘΧΡΑΝΒΙΟΝΒ [11 | 


with φ' (x) =f" (x) - eae, 
But φ (a) --ᾧ (Ὁ) -Ξ 0. 
So, by Rolle’s Theorem, there is an « (a<« <b) such that φ' (α) ΞΞ0, 
i.e. such that f' («)- i a =0. Q.E.D. 


Notice that, if f(x) is a specified function, then it may well be 
possible to locate « as a particular value. For example, if f(x) =2? 
with derivative f’ (5) = 2x, defined over the interval [1, 2], then 


f(b)-f(a) 25- 1 τον τ ως 
τ τ eae Ξ 3 fora=I1, b=2 


Hence f' (a) LO) 16) at «=# in the interval [a, b]=[1, 2]. 


The importance of the theorem, however, lies in the fact that we 
know there is a value « whatever function f(x) is taken, subject to 
the conditions named. 

The Mean Value Theorem can be expressed in an equivalent form. 
Write b=a+h so that h=b-—a>0. Then, under the named con- 
ditions, we have: 


f' («)= fern) =f) for some a, a<a<a+t+h. 


Write «=a + 6h, where 6 is a real number such that 0<@<1: 
fer") — f(a) -- (α + 6h) 


1.6. f(ath) =f (a) +hf' (a+ Oh). 
The same result is true if h is negative, say h= —k (k>0). Take the 
interval in question as [a —k, a] so that the above result modifies to: 
f(a)=f(a- k) +f’ (a - 6k) 
i.e. f@=f(at+h) -hf' (a+ 6h) 
1.6. —f(ath)=f(a)+hf' (a+6h) as before. 
An important extension to T'aylor’s Theorem now follows: 


TuroreEm: If f(x), f’ (x), ... f(x) are continuous axx<a+h and 
of f"t (a) exists a<a<ar+h, then 


1] EXPANSIONS 303 


f(a+h)=f(a) etre +... Ἐπ᾿ f(a) + R,, 


(n+ 
where — R,= a aid tD(a+6h) (0<@<1). 


Moreover, 88 ἘΝ the wen also holds if h is negative as well as 
positive. The proof is on exactly the same lines as that of the Mean 
Value Theorem, but it starts with a more complicated function. If 
b=a+th, write: 


F (2) =f(6) (2) - ὁ -a)f (a) -C 5 p(x -.. - ἘΞ ΞΡ κυρ). 
So 
F@)=-f@+{f (2) — (b— 2)" (x) + (0 a)f" (a) - OS py 4... 
ΠῚ air aa fO(x Omar ay ΠῚ 
= τ =) ype) (all other terms τς πεῖ 
Write: (x)= F (x) -- (Ὁ ἀγα Ἐς ; 
So: #W=Fe)+ (n+ 1)6-ay ge Or, 


-αε το { @)-Caare porn al, 


Now ¢(a)=¢(b)=0 so that, by bn s Theorem, there is an 
(a <a <b) such that ¢’ («)=0, 1.6. such that : 


b 
F (a) -- ar f(a) = 0. 
Put back b=a+h, 1.6. ὃ -a=hand «a=a+ 6h (0<0<1). 
F(a)=R, where R,= oy ae +1)(a + 6h) 
and F(a)=f(a +h) -- [(α) ~hf' (a) - 5 f" (a) =». 2 pa), 
Q.E.D. 


From the point of view of the expansion of f(x), the useful case of 
Taylor’s Theorem is that with a=0, h=z: 


Ff (x) =f (0) +f" (O)a +f” (ΟΣ Ἔν ἘΡ (0) Pegs budurvecas? (1) 


304 EXPANSIONS [11 


ant 
here Ξε [0.11) 
Rk, can be written in various ways; another one (11.9 Ex. 4) is the 
following : 
an 


Ry, =f) (6x) (1 - 6 (O02 a eevee (3) 


The conditions for writing (1), to make Taylor’s Theorem valid, are 
that f(x), f’ (x), ... f(x) are continuous over the interval [0, 7] and 
that f+» (x) exists within the interval. To allow for various positive 
and negative values of x, we usually write (1) for | x |<r, where r is 
selected so that the function and its first » derivatives are continuous 
over the interval [ — 7, 7] about x =0 and so that the (n + 1)th deriva- 
tive exists within the interval. 

We can now look ahead a little. Consider the sequence or series of 
terms 


f(0) +f" (0) +f" (Oe, ΕΝ + f(0) + 


where the ‘+’ signs indicate that we propose to add the terms. This 
is called Taylor’s series. What we have shown in (1) is that the sum 
of Taylor’s series up to and including the term in x" is an approxima- 
tion to f(x), provided that R,, given by (2) or (3) is small. In this case, 
we have an approximate representation of the function f(x), what- 
ever its form may be, as a polynomial of degree n in x. We would like 
to go further and say that f(x) is exactly represented by the sum of 
Taylor’s series continued indefinitely (the sum of the ‘infinite series’). 
For this, we look for R,,—>0 as n— oo. This is, in fact, the position we 
eventually attain. It all turns on what can be said about ἢ,» 
1.6. the remainder or difference between f(x) and its polynomial 
approximation. 

One further result can be added, the Mean Value Theorem for 
entegrals : 


THEOREM: If f(x) is continuous a<a<b, then « exists (a<a<b) so 
that: 


| 2) de=f(0)(6-a». 


Things are simpler for integrals and all we need is the continuity of 
the function over the interval in question. The graphical interpreta- 


1, 2] EXPANSIONS 305 


tion is clear enough. In Fig. 11.1c, which 
has f(x) positive in [a, Ὁ], the area under 
the curve y=f(x) between A and B is 
equal to the area of a rectangle UCDN 
given by the height of the curve at 


b 
f(x) dz; 


the rectangular area is ζ(α) (Ὁ -- α). The 
proof comes directly from the definition of the integral as an area. 


Since f(x) is continuous over [a, 6], it ranges over an interval 
b 

[c, d]: c<f(x)<d. The area | J (x) dx is not greater than the rectangle 
᾿ a 


some point P. The first area is 


Fia. 11.1ς 


d(b —a), d being the LUB of f(a), and it is not less than the rectangle 

b | 
c(b-a). So: c(b -ὡς f(z) dx<d(b-a). But f(x) attains every 
value between c and ὦ. Therefore there is some « (a<a<b) so that 


f(«)(6—a@) takes a specified value between c(b—a) and d(b—a), 1.6. 
b 
exists so that f(a) (Ὁ -- α) =| f (x) dx. Q.E.D. 


11.2. Maximum and minimum values. Suppose only that the function 
y =f (x) is bounded on an interval [a, δ]. Then there is a GLB c and a 
LUB d such that c<f (x)<d in the interval [a, b]. It is not necessary, 
however, that f(x) takes every value in the interval [c, d]; it happens 
to do so if f(x) is continuous. The (extremely) hypothetical example, 
represented graphically in Fig. 11.2, illustrates the possibilities to be 
expected. The GLB c occurs at F where x =«; the LUB d occurs at Θ᾽ 


Fria. 11.2 


906 EXPANSIONS fil 


where x=. From the point of view of picking the smallest and 
largest values of y=f(x) over the interval [a, δ], we may write: 


Inf f(xz)=c at x=a 
Sup f(z)=d at x=f 


where Inf stands for ‘infemum’ and Sup for ‘supremum’. 

A more important matter is the determination of local bounds, i.e. 
the smallest or largest values in a particular neighbourhood. This 
leads at once to the concept of (local) maximum and minimum values: 


DEFINITION: f(x) has a maximum value Maz f(z) αἱ x=«a 1f there 18 
a neighbourhood N of « such that f(x)<f(«) for allue N, xa; anda 
minimum value Min f(x) if f(x)>f(«) under the same conditions. 


The possibilities are illustrated in Fig. 11.2; maximum values occur 
at B, D and G and minimum values at A, C and F’. Being local con- 
cepts, several maximum (minimum) values can occur; and one 
maximum (e.g. at D) can be below some minimum (e.g. at A). In 
general, if f(x) is not continuous, the determination of the maximum 
(minimum) values of f(x) is a matter of direct use of the definition. 

Concentrate now on a particular point x=« and assume that 
Γ΄ («) exists. The sign taken by /” («) is important in view of the inter- 
pretation of f’ («) as the rate of change of f(x) at x=<a: 


ΤΉΒΟΒΕΜ: If f’(«)>0, then f(x) 18 locally increasing at x=; uf 
Κ΄ («) <0, then f(x) is locally decreasing at x= «. 


Proof: suppose f” («)>0 so that e+") “S(e) >0 for sufficiently small 


h. Write στα - ἢ and there exists a neighbourhood Ν of « such that 

f(x) —f(«) has the same sign as 2 -- for all x of NV. Hence f’(«)>0 

implies that f(x) is below f(«) just to the left, and above f(«) just to 

the right, of «=a. The opposite holds for f’ (a) <0. Q.E.D. 
It follows immediately: 


THEOREM: A necessary condition for a maximum or minimum of 
f (x) at « is f’ (x) =0, 1.6. a maximum or minimum at « implies f' («) =0. 
For, if f’(«)>0 or f’(«)<0, the theorem above shows that there 
cannot be a maximum or minimum at x=«. In Fig. 11.2, derivatives 
exist at the maxima B and D and at the minimum 4; in all cases the 
derivative is zero (tangent horizontal). Derivatives do not exist at C, 


2] EXPANSIONS 307 


F and G' so that the theorem does not apply. Conversely, at the point 
E, the derivative is zero, but this is not a maximum or minimum 
value. The condition f’ («)=0 applies only when the derivative exists 
and it is a necessary but not. a sufficient condition.* 

Various sufficient conditions can be found for a maximum (mini- 
mum) value. For example, if f’ (x) exists in a neighbourhood N of 
z=a and if f’(x)>0 for x<a in N, f’ (x) <0 for >< in N, then f(a) 
is a maximum value. Similarly, if the signs are reversed, i.e. f’ (x) <0 
to the left and f’ (x)>0 to the right of x =a, then f(«) is a minimum 
value. 

Finally, for a function which has derivatives of all orders, a con- 
venient set of necessary and sufficient conditions can be written: 


THEOREM: The necessary and sufficient conditions for f(x) to have a 
maximum (minimum) at x=« are that the first non-zero derivative at 
%=«a 18 of even order and that its sign is negative (positive). 


By Taylor’s Theorem (11.1), if f”(«) is the first non-zero derivative: 


γα +h)=f(x) + f(a) + By 


h Ric μα αι θΆ 
where "=a (α Ὁ ). 


Now fh can be taken so small that R, is numerically less than 


ai 79 (α); this is because #, involves the higher power h”+1. Hence, 
for small h, the sign of f(«+h)-—f(«) is fixed by ΠΡ (a). Suppose 


f™(«)<0 and 7 is even; then ~ f"(«) <0 for all h, whether positive 


or negative. Hence, f(« +h) —f(«) is negative for all small h, i.e. f(a) 
is a maximum. Similarly, if f"(«)>0 and n is even, then f(«) is a 


* All local maximum and minimum values of y =f(z) are sometimes described as 
extreme values of the function, i.e. an extreme value is either a maximum or a mini- 
mum. All values of y=f(x) such that f’(x) =0 are sometimes called stationary values 
of the function, i.e. they comprise such maximum and minimum values and such 
points of inflexion as occur where Κ΄ (x) exists and is zero. Of the seven points indicated 
in Fig. 11.2, all give extreme values except EZ, and all give stationary values except 
C, F and G. There are extreme values which are not stationary, and stationary values 
which are not extreme. However, for points where f’ (5) exists, the position is simpler: 
all extreme values are stationary. The converse is still not true, i.e. stationary values 
include points of inflexion (such as 1) as well as maximum and minimum values. 


L A.B.M. 


908 EXPANSIONS [11 


minimum. The conditions are sufficient. Conversely, if f(«) is a maxi- 
mum (minimum), then f(«+h)—f(«) is negative (positive) for all 


sufficiently small ἢ. Hence ua "(a) is negative (positive) for all 


sufficiently small ἢ and the stated conditions follow. The conditions 
are necessary. Q.E.D. 

The result is most suitable for use in practice. If f(x) has derivatives 
of all required orders at x =«, we proceed as follows. First, check that 
f’(«)=0. Second, write f(a) and see if it is non-zero; f’’ (a) <0 
implies that f(«) is a maximum and f” («)>0 that f(«) is a minimum. 
If f’’ («)=0, write f’’’(«); a non-zero value here means that f(«) is — 
neither a maximum nor a minimum. This is a case of a point of 
inflexion, as at the point # of Fig. 11.2.* If f’"’ («)=0, write f’’”’ (a); 
if this is not zero, its sign determines whether f(«) is a maximum 
(negative sign) or a minimum (positive sign). The process goes on 
until a non-zero derivative is reached. Two examples: 


(i) y=a3—327+5 for which Dy=3x? —-6x=32 (x -- 2) 
and D*y=6z —-6=6(x — 1). 


Hence. Dy=0 at x=0 and at 2=2. 
At x=0, D¢y= --θ «ὁ 1.6. y=5 at x=0 is a Maximum 
At x=2, D*y=6>0 Le. y=1 at x=2 is a minimum. 


(ii) y=23 —322+3824+5 for which Dy=32?-67+3=3 (a —1)? 
D*y = 6x — 6 =6 (x -- 1) and 
D y =6. 


Hence, at x=1: Dy = D*y =0, D®y=6+0. This is a point of inflexion. 
Since Dy=0 only at x=1, there are no maximum or minimum 
values. 


11.3. Convergence of series. A sequence of terms u,, (for n = 1, 2, 3, ...) 
is a countably infinite set. Suppose we wish to add them. From this 
point of view, the terms are an infinite series, written u, + 105 +Ugt..., 
or more shortly as Yu,. The sum of n terms can always be written: 


n 
Sy= Σ᾽ Up =Uy + Ug + Ug t+... + Up. 
r=1 


* A point of inflexion is one at which the curve turns over the tangent; it can occur 
for upward or downward sloping, as well as for horizontal, tangents. 


3] | EXPANSIONS 309 


The question is whether any meaning can be attached to the sum of 

the whole infinite series. The fact that this is a real question, and 

probably not an easy one to answer, is clear from the following argu- 

ment: 

Write a=14+44+4444+... and 6=14+4+4+44+F+... 

Then 

a=(14h444...)4G4+44+ G+...) =b+$(L454+5+...)=b +50. 

Hence b =4a. Now write c=1-34+4-4+... 80 that 
e=(1+9++...)-(2+4 

But c=(1—3)+(§-2) τί 

There is obviously something amiss. There is a fallacy, and it is in the 

first line, in the all-too-facile assumption that these two infinite 

series do have sums a and ὃ. It will be seen that this assumption 1s 

wrong. 


ol 
Ϊ 
Ole 
a 
V 
2 


The sum S,, = δ᾽ u, is itself a sequence for n =1, 2, 3, ... . The ideas 


r=1 
of 9.4 and 9.5 on limits of sequences are of immediate application and 
we may ask whether S, has a limit as n—>oo. If so, the series is 
convergent : 


n 
Derinition: If S,= }\u,>S as n-> 00, the infinite series Yu, 18 
r=1 


convergent and its sum is S. 


In other cases, the infinite series is not convergent and there is no 
sum. The properties of limits provide two simple results: if Lu, 
converges to S and if k is a constant, then Dv, where v, =ku, con- 
verges to kS; if Du, converges to S, and Σύ, converges to S,, then 
Sw, where w, =u, +v, converges to 8S, +84. 

The theorem of 9.5 on conditions for a limit translates into an 
appropriate form for convergence of series: 


THEOREM: dw, is convergent if and only if, given ε, there is an 
integer N such that | S,,-S, [Ξε for all m and n both greater than N. 


The theorem of 9.5 states that, given ε (and so given 3c), there is an 

integer N such that S —4«<S,<S + 4e for all p>N, if S, is to tend 

to S as n-> oo. Hence, if both m and n are greater than NV: 
S—te<S,,<S+te and S—$e<S8,<S+ $e 

1.6. S,, and S, can differ by « at most. Q.E.D. 


910 EXPANSIONS [11 


As a necessary condition for convergence, u,,—0 as n->oo. This 
follows from the theorem just proved. We have: u,=S,—S,y-1. 
Suppose yw, is convergent, so that, given ε, we can select an integer 
(say) Ν᾽ -- 1 such that |S, -- δ... |<e for πὶ and n—-1 both greater 
than N -1. Hence | u, |< « for all n>N, ie. u,—>0 as n->co. The 
condition, however, is by no means a sufficient one; we find many 
examples of series which have u,—>0 as n->oo but which are not 
convergent. 

The various possibilities can be illustrated by a number of 
examples: 


A: Alternating Series B: Positive Series 
(i) l-g+ie— sat. σ 1 τὰ εὺς ἘΕζ... Ο 
(il) 1-- τ 1 -- τ, C ΕΝ ΝΟ 
(ii) 1.--1-]--1 τ... ΝΟ 1+1+1+1+... NC 
(iv) 1-34+%$-42+.. NC L+$+9+47 4+... NC 


Four pairs of infinite series are shown; the second of each pair is a 
series of positive terms, the first the same series but with terms 
alternating in sign. The pairs are arranged in an obvious order; the 
terms of (i) decrease rapidly, those of (ii) less rapidly, while (iii) and 
(iv) have terms which do not decrease. By the necessary condition, 
though (i) and (ii) may be convergent, (iii) and (iv) are certainly not. — 
In the table, C indicates a convergent series and NC one which is not 
convergent, as established by the following analysis: 

(1) These are geometric progressions of the form 1+r+7r2+7r3+... 


1-- πη 
with 8,.=>— (r#1). Here r= +3 so that: 


1—(-—3) 
A: cee, a) =F{1 -(-2)"}->4 as n> 
Ἰπτοζ 5.) 
᾿ _1-@ 83\n 
τ 4 


Both series are convergent, A with sum ξ and B with sum 4. 

(ii) There is no simple or obvious expression for S,,, for all integral 
n, and indirect methods need to be adopted in determining the con- 
vergence or otherwise of the series. Equally, in the absence of an 
explicit form for S,, the sum S of the infinite series (if convergent) 
cannot be found directly. | 


9 EXPANSIONS 311 


1] l Lis ] 
A: 8S ξεῖν, 7 
ΤΣ 5 ἜΣ era} ars aga since — on a ae τ 
i ] 1 1 
eee =Sages a on a Ν Gyo Bane since Spal on 


for any positive integer n. Hence, the odd sums Sj, S3, S;, ... form a 
decreasing sequence, all of them being less than S,; the even sums 
S,, S,, S,,... form an increasing sequence, all greater than S,. 


Further: Son+1—Son= which can be made as small as we please 


2n +1 
for sufficiently large ἡ. Hence the decreasing sequence of odd sums 
tends to the same limit S as the increasing sequence of even sums and 
S,<S <S,, ie. <8 <1. The series is convergent with a sum which 
we cannot yet specify, except that it lies between 3 and 1. 
B: 1494+ $+at+...=149+(§ +2) +G+6+7+48)+--. 
»ΙΤἝ ΣΈ ΣΈ ΣΈ... 
since 4>2, ie. 4+4>2=%, and similarly for the other groups. 
Hence, if enough terms of the series are taken to make up n groups, 
1 1 

the sum is greater than 1 τ σεις 5 -"-- —>0oo as n->0o. The series is 
not convergent; the sum S, increases steadily and indefinitely 
with n. 

(iii) These series are not convergent since the general term does 
not tend to zero. Since expressions for S,, can be written, we can say 
something more. 

A: S,=1 (n odd) and S,=0 (n even) 
B: S,=n—->© as n>. 


Hence series A oscillates between finite limits (0 and 1) but series B 
has a sum which increases steadily and indefinitely with n. 

(iv) These series are also not convergent; indeed | wu, |—00 as 
n->oo for each. They are geometric progressions with r= +3; the 
sums S,, are as follows. 


and this oscillates with en amplitude as n increases. 


B: S,= ay" = er" =2((8) -1}-οὺ asn—>oo. 


312 EXPANSIONS [11 


Hence the series A oscillates and series B does ποῦ; in both cases, the 
absolute value of S,, increases indefinitely with n. 

Some general conclusions can be drawn. First, there are only two 
possibilities for a series of positive terms: either the series is con- 
vergent with S, increasing steadily to S, or the series is not con- 
vergent with S,, increasing steadily and indefinitely. Second, there is 
a greater variety of possibilities for a series with mixed positive and 
negative terms, including cases of (finite or indefinite) oscillation of 
S,. Third, the series of alternating terms may often be the same 
(simply as regards convergence or otherwise) as the series of positive 
terms. But there is the interesting marginal case, illustrated by (ii), 
where the alternating series is convergent and the positive series not 
convergent. 


11.4. Series of positive terms. The series Swz,, where u,>0 all n, has 
a sum S, which is positive and increasing with n. This simplifies the 
situation considerably : 


either S, is bounded above, increasing to a limit S, with Du, con- 
vergent to S; 

or ΑΚ, is not bounded above, increasing without limit, with dw, 
not convergent. | 

For series where no simple expression for S,, is available, we require 

tests which distinguish between the alternatives. The simplest of 

such tests turns on: 

COMPARISON THEOREM: If u,<kv, where Sv, is convergent and k a 
positive constant, then Yu, is convergent; if u,kv, where Sv, is not 
convergent and k a positive constant, then Yu, is not convergent. 


The proof is simple. In the first case, if S,, = x u, and 8,’ =) Vy, then 


Sa<kS,’. But S,,’ has a limit (n— oo) wiht is an upper Bound of S,,’ 
hence S,, is bounded above and yw, is convergent. Similarly for the 
other case. 

An obvious series for comparison is the geometric series : 


Ll+rt+r27tr3 +... 


: gh — Eso 
where r is positive. This is convergent to ἐν if r<1 and not con- 


4] EXPANSIONS 313 
vergent if r>1. The following tests* then follow: 
Caucuy’s Test: If u,<kr"—! (all n), where 0 <r <1 and k a positive 


᾿ k 
constant, then du, 1s convergent to sum S<——. 


Ι- 7 
4“ 
D’ALEMBERT’S TEST: If ᾿Ξ "<r (all n>1), where 0<r<l, then 
n-1 
: U 
Lu, 18 convergent to sum S Ξι- :Ὲ 


The proof of the first is a direct application of the Comparison 
Theorem. The second is reduced to the first by writing: 


Un Un-1 Us - 
sas ae rae a Vita α ἢ 1y,. 

Since a finite number of terms can always be added to or removed 
from a series without affecting its convergence, the tests apply 
equally well if the conditions hold for sufficiently large n. (The sum 
S, however, is changed.) Hence there is a more practical form of the 
tests using limits: 


If Vu, or τῇ 
μ 


n-1 


tends to a limit less than 1, Sw, 1s convergent. 


Proof: suppose Vu,—L as n-> oo (<1). Given e, there is an integer 
N such that LD—«< Vu,<L+e for all n>N. Write L+e=r and 


choose ε so that r<1 (as well as Z<1). Then \ tn <1, 1.6. U,<r", 
for all n>N, which is Cauchy’s Test (with k=r). The other case 
follows similarly. 


It can be noticed, in passing, that the question of the convergence 
of an infinite series is matched by a similar question of the con- 


k 

vergence of an infinite integral. The integral { f(x) dx can always be 
a 

written if f(x) is continuous for z>a. It remains to investigate 


k 
whether | f(x) dz—+L as k->oo. If so, the infinite integral exists: 


| f (x) da = L (see 10.9 Ex. 24). A further test of the convergence of 


an infinite series of positive terms can be derived from consideration 
of an infinite integral (11.9 Ex. 19). 


* The first test was given by Cauchy (1789-1857) in the particular case where k =1; 
the second test was given by d’Alembert (1717-83). 


914 EXPANSIONS [11 


11.5. Absolute and conditional convergence. When an infinite series 
consists of mixed positive and negative terms, the problem of testing 
for convergence (in the absence of a simple expression for S,) is 
complicated by the greater variety of possibilities. A series may fail 
to be convergent for several reasons: S,, may increase steadily and 
indefinitely through positive values, or decrease steadily and in- 
definitely through negative values, or oscillate between finite limits 
or between indefinitely increasing values.* The best procedure in 
practice is to start with the corresponding series of positive terms, 
1.6. to get at the convergence of Du, by first examining Σ | u,, |. 

Suppose that ¥ | u, | is convergent. Then the series Σύ, consisting 
of the positive terms of Du, is convergent. Equally, the series dw, 
which consists of the negative terms of Dw,, each with sign changed, 
is convergent. Here, Σ | u, |= dv, + Dw, and Xu, = dv, —-Xw,- Hence, 
if | u, | is convergent, so is Su, and it is then said to be ‘absolutely 
convergent’: 

DEFINITION: The series Yu, 18 absolutely convergent if > | u,, | is 
convergent; Lun is necessarily convergent if it is absolutely convergent. 
Dhe series Du, 1s conditionally convergent if it is convergent but d | up, | 
1s not convergent. 

An illustration of absolute convergence is a geometric series 

ee aie os Make we dee ee 
with [r|<1. The fact that, if > | u, | is not convergent, then Du, 
may be convergent, is illustrated by example (ii) of 11.3. The series 
1~3+3~-Z+... is conditionally convergent; it is convergent while 
1+3+ 9+4+... is not. 7 

For testing series of mixed terms, if not absolutely convergent, we 
have very little to go on; indeed, there is only one result of general 
assistance : 

THEOREM: If u, is positive and decreases steadily to zero as n-> 00, 
then the alternating series uU,— Ug “Ὁ τις —U,+... 18 convergent. 

The proof follows on exactly the lines used in the particular case of 
example (ii) A of 11.3. The sum of the series lies between wu, -- uw, 
and 2}. 


* The negative term ‘non-convergent’ covers all these possibilities, but more 
positive labels are variously attached by different writers. Some describe all non- 
convergent series as ‘divergent’; others distinguish between ‘divergent’ and ‘oscil- 
latory’ series. 


5, 6] EXPANSIONS 315 


Iwo further examples illustrate: 
1 1 : 
(i) The series 1+1+— 5] ἘΞ: 51 te 18 convergent by d’Alembert’s 
test, since 
Un (n-2)! 1 


] 
a=)! an τς, (n—1)! a a | <I] for n>2 


1 1 : 
The series 1 -1+— 5] 531 8 absolutely convergent. 


(ii) The series 1 -$+34-—4+... is convergent by the theorem above. 
But: 
14+34+ 347+... H=l4+gt+(F4o)+(Gtertistia) t+... 
ἘΔ ἘΠΕ re 


of hes series are alan to make up n groups, ΤῊΝ sum exceeds 


1 ὭΣ δου 


4 4 


i.e. the series is not convergent. Consequently, 1-4+%-#4+... is 
only conditionally convergent. 
A convergent series can be used, by taking enough terms, to give 
an approximation to its sum. For example, if the sum of the series 
L : : : 
1+1 ἜΣ] : 1 Ἐ 31 Ὁ 50: 18 denoted by the constant 6, a rational approxi- 
mation ei e (which is irrational) is: 


6-:-1-ἘἸ ἘΣ Ἐξ οἶδ Ἐτὲν Ἐτὲσ Ἐτξοῖο Τ.-. 
-- 2-5 

+ 0-1667 

+0:0417 

+ 0-0083 

+ 0:0014 

+ 0:0002 +... 

-- 9.718 to three decimal places. 


Seven terms of the series are required for this approximation. 


11.6. Power series. The development of 11.1 suggests that Taylor’s 
Series may give an approximation to a given function f(x) as a 
polynomial in ascending powers of x, and an exact representation of 


L2 A.B.M. 


916 EXPANSIONS [1] 


f(x) as the sum of an infinite series. The remainder &, in Taylor's 
Theorem is critical; it must be small for the approximation and it 
must tend to zero as n->co for the exact representation. This is the 
matter now to be pursued, i.e. the expansion of a function f(x) in 
ascending powers of 2. 

If x is a real variable and do, @;, Ag, ... Gn, «-- a Set of constants, then 
Va,w" =A, αι +a,22+... is a power series. Given a value of x, the 
series may be tested for convergence in the ways described above. 
It may turn out to be convergent for no x (x0) or for all x; examples 
of both cases are given later. In general, however, it is convergent 
for some and not for other values of x. A most convenient result can 
be established ; it is that a power series is convergent over an interval 
(open or closed) around x=0 and absolutely convergent within the 
interval. It is just not possible for the series to be convergent for 
disconnected values of x. This result is reached in two stages. 


ΤΉΒΟΒΕΜ: If Sa,x" is convergent at x=«, then it 1s absolutely con- 
vergent for —r<a<r, where r=|« |. 


Proof: since Xa," is convergent, it is necessary that a,«"—0 as 
n—> οὐ, ie. that a,«” is bounded: | a,x" |<k for some positive k and 
all n. Hence: 


| a,x” [=| Ana” | | e/a I"<k | “ἰα μ 


so that, by Cauchy’s Test (11.4), © | a,x” | is convergent if | χ|α 1 «1, 
i.e. Xa,x" is absolutely convergent if | 7 |<| «|=r. Q.E.D. 


THEOREM: Given a power series Sa,x", there are three possibilitres : 


(i) ὁ} is convergent for no non-zero x 
or (ii) it ts absolutely convergent for all x 
or (iii) it is absolutely convergent for | x|<r, and not convergent for 
|x [  Ύ, where r is some positive constant. 


Notice that, in case (iii), nothing is said about the convergence of the 
series for x= τ; this is a matter left entirely open. The proof 1s: 
Partition the set of all real numbers x0 into two subsets A and B, 
where A consists of x such that Ya,” is convergent and B consists 
of x such that da," is not convergent. A is not empty since it con- 
tains x =0. B consists only of positive x but it may be empty. If B 
is not empty, then « <f for all « ε A and f ε B. This follows from two 
applications of the theorem just established. If α ε A, Da,«" is con- 


6] EXPANSIONS 317 


vergent and Ya,2x" is convergent for all «<a, i.e. there is no BeB 
which is less than «. Conversely, if β ε B, Sa,6" is not convergent and 
a,x" cannot be convergent at any x=a>f; otherwise it would need 
to be convergent also at 8. Hence there is no αε A which is greater 
than f. Finally «=f is ruled out since A and B are disjoint sets. 
Hence α «β always, i.e. all elements of A are less than all elements of 
B. If ris the LUB of set A, the dividing value between A and B isr, 
belonging either to A or to B. Similarly, the set of real στο is 
partitioned into two subsets A’ and B’ for convergence and non- 
convergence of Sa," respectively. All elements of A’ are less in 
absolute value than all elements of B’ and there is a dividing value 
—r’, The preceding theorem then shows that the interval of con- 
vergence must be symmetrical (i.e. r=r’) and that, within the inter- 
val, Xa,x" is not only convergent but also absolutely convergent. 
Pulling the cases together, we find three possibilities: 4 and A’ 
contain only x=0 which is case (i); or B and B’ are both empty 
which is case (ii); or there is a positive r such that the series is 
absolutely convergent for —r<2x<r and not convergent for «>r 
and « « --γ. Q.E.D. 

Hence, in handling Ya,2", we look for the marginal values x= +r. 
The series is absolutely convergent within the interval [ —r, 7] and 
not convergent outside it. At = +r, the series may or may not be 
convergent. When found, r is called the radius of convergence of the 
power series. Illustrative examples follow, arranged under the head- 
ings (i), (ii) and (iii) of the three cases: 

(1) L+xv+2!e?+3la8+...+nla"+... is convergent for no non- 
Zero ZX. 


@ 
Here: u,=n!a"=nz x (n-1)!a"-l=nz xu,_, if the series is Σ Un 
n=0 
i.e. | vn [=n] x] | uy |- 


No matter what non-zero x is taken, there is n sufficiently large for 


n|x|>1, ie. for |u,|> | u,_, |. Hence, u, cannot tend to zero as 
n—> οὐ and the series is not convergent for any non-zero z. 


= a rl 
(il) 1+2 toptayte ἘΠῚ te 18 absolutely convergent for all z. 


| & | 


U A 
. 5 |=!" '_.0 as no. 


Hg 
Here: Un, =— and 
n! 


Un—-1 | 


318 EXPANSIONS [11 
By d’Alembert’s Test (11.4), Σ | u, | is convergent, 1.6. Lu, is abso- 
lutely convergent, all x. The same result holds for other series obtained 
by taking some (but not all) terms of the series, and by varying their 
signs. For example: 


| gt. ge ot ; oe a he 
l-sth ᾽ ἘΠ Τα 81 Τ᾽ 
ge og! (ra ead 
wt 31° 5! ΝΠ Ἐν τ ΤῸ τὶ TR ὙΡΊ 8 
are each absolutely convergent for all «x. 
. χὐ κα x" 
111 245-2 4..,4(-1)"1— +... is absolutely con- 
(iii) (a) x ae re +(-1) ae y 
vergent, — 1<a<1. The series is convergent at x=1, not convergent 
αὖ z= —1. The conditional convergence of the series when x=1 


follows from example (ii) of 11.3. Hence, by the first theorem above, 
the power series is absolutely convergent for -1<v<l and not 


convergent outside this interval. 
2n—1 


7 
(Ὁ) x-=~-+2.- τῶν +...+(-1)"71 ad +... is absolutely con- 


3 5 7 2n — 1 
vergent, —1<z<1. It is conditionally convergent for x= + 1. When 
a= 1, the series is 1 -4+4-4+... which is conditionally convergent 
by example (ii) of 11.5. The same series is obtained, with all signs 
reversed, when x= --1. The absolute convergence of the power 
series for — 1<a <1 again follows by the first theorem above. 


11.7. Expansions of functions. A function f(x) has derivatives of all 
orders for all values of x (at least in a certain interval around z=0). 
Then, by Taylor’s Theorem of 11.1: 


f (w) =f (0) +f" (O)a +f" (0) 5+ ἐν +f(0)— dhe Ti ποῦυώον (1) 
where R,=ft? (θα) ai or f+) (Ax) (1 -- 6)" a sahoee (2) 


for some real θ between 0 and 1. 

The next step is the difficult one: to find what values of x have 
R,—>0 as n->0o. By the main theorem for power series (11.6), we 
expect either no x (except x = 0), or all x, or x within a certain interval 


7] EXPANSIONS 319 


[-r,r], where r is the radius of convergence of the power series 
written by continuing (1) indefinitely: 


Fa) =f(O) +f" (Oe +f" Oley ΟΣ (3) 


The first two cases can be included in the third by allowing r =0 and 
r—-> oo. The difficulty remains; it is to find the appropriate r. If this 
can be done: 

TurorEM: If f(x) has derivatives of all orders and if Καὶ, given by (2) 
tends to zero as n> for all x within the interval —1 <a <r, then f (x) 
can be expanded as the power series (3), absolutely convergent for 

—T<U<Y. | 
There is uncertainty whether or not the expansion holds also for 
%=-+71, a point which needs examination in each case. 

To illustrate the problem of examining the limit of R, and hence 
of fixing the radius of convergence, consider the expansion of the 
power function (1+2)™, where x is a real variable and m is a given 
rational exponent. This is one of the most frequently used of all 
expansions, a very important one to establish. Two particular cases 
can be conveniently taken before the general case. 


Case: m= —1, i.e. the expansion of (1+2)-?. 
The geometri ies 1 2 χϑ h ge ate as 
geometric series 1 --α τα -- αὐ Ἔ... has τε τ To ΓΙ 
n—> 0, provided that | x |<1. Hence we have the expansion: 
1 
| gS ee ee for. = 1 9 3,3} ὑἁὁονρυξοι (4) 


It can be checked that the coefficients are those written in (3). We 
have 


1 n! 
Generally: Dr (=) = (3 ayn 


i [> “(i id ity χω 


ΞΞ}] -α τα -- αὐ ΠΝ - (4) simply adds the fact that 


1 
3 s 
So (3) gives Ἐπ π 


320 EXPANSIONS ΠῚ 
the radius of convergence is r=1, that the series is absolutely con- 
vergent for | x | <1. 

Case: m=n (positive integer), 1.6. the expansion of (1 +2)". 
Here the expansion is a finite polynomial of degree n, the infinite 


series of (3) terminating with the term in 2". The polynomial is given 
by the Binomial Theorem of elementary algebra: 


(l+2)"= y ως + (T)e+ ΠΩΣ aa Μὰ 1) 1Ὲα ἀξδξὲς (5) 


r=0 


! 
where (ἢ δίς «δος r=1, 2,3,...n-—1 
γ,) ri(n—-r)! 


with the convention that (5) = (") = 1 (see 1.7 above). For particular 


values of the integers n and r, the values of ("" can be got from the 


formula or, more easily, from Pascal’s Triangle (5.8 above). For 
n=2 and n=3, (5) gives: 
(l+2)?=1+2¢+2?2 and (1+2)8=14+374+32?+23 
as can be checked by squaring and cubing 1+ 2 directly. It can 
again be checked that the coefficients of (5) agree with those written 
in (3). For: 
D(1+a)"=n(n-1)(m-2)...(m—r+1)(L+2)"-" (r<n) 
=0 (r>n) 
Hence, the last term in (3) is x", all later ones being zero. For r<n, 
the term in x’ is: n(n —1)(n—2)...(n—r+ a= Κορ = (er 
as given in (5) 
General Case, the expansion of (1 +2)™, m any rational. 
The general derivative is: | 


Dn (1+2)"=m(m —1)(m—2)...(m—n+1)(1+a)"— 


and so: [ Dea +o)" | =m(m —1)(m -- 2)...(m—n+1). 


Then (1) and (2) give: ᾿ 
ΤΣ 4D) mm en 
m(m—1)(m—2)...(m—n +1) 


" ] 
n! 


a" +R, 


1] EXPANSIONS 321 


where 


R, =m(m-1)(m—2)...(m —n)(1-+62)"-" (1 — 0) 


n! 


(0<@<1). 


Some awkward algebra is needed to determine whether (and for 
what x) R,—>0 as n->oo. Write R,=¢(n)p(n) where: 


-1 ~—2)...(m -- 1-6)" 
FY ale )...(m ἢ) adie b(n) = er 
Then $(n)=—— xh (n —1) 
1.6. a ἘΞ Ξ -1||χἘ[-5] 1 asn-oo. 


This limit is less than one, and so | φ(η) |< |¢(n—1) | for sufficiently 
large n, provided that | x | <1. In this case, $(n) is bounded for all n. 


Further y(n) = (ὑπ ἃ + 6x)"-1+0'as n-> 00 {Πα | <1. This is so 
since, given 0<@ <1 and | x | <1 0<(755)" Landso(;—5-)"> 
"ὃ ' 1+ 6x = 1+ 62 


as n->00; (1+6zx)™-1 is not affected, being independent of n. Hence, 
R,,=¢ (nb (n)—>0 as n-> 00 if | x | <1. We have what we need and we 
can write: 


(l+z2)™=1+mx+ 


1 ΤῸ 8 


m(m—1)(m ml aaa Dy sis 


m(m — 1) χ3 ὑῶν aiding 


of for -1<e<l) 


The expansion (6) is called the Binomial Series. When m is a positive 
integer, (6) reduces to the finite series (5) of the Binomial Theorem. 
For other rational m, the series (6) does not terminate. Some ex- 
amples: 
: —1)( —2)(-3)...(- 
(i) m= —1: general term = fT Sree a =(-1)"a" 


and the expansion is: 


1 


—=l—-x+a*-2z?+... (-l 1 
Γι +2 —x + (—-l<ar<]l) 


which is the particular case (4) already written. 


922 EXPANSIONS [11 
1(_41)(_38) (_n43 

(ii) m=: general ee La A aa mL an 

1.3.5...(2n -- 8) 


Ξε ες n—1 
τὶ 1) 2.4.6...2n 


x” (n>1) 


and the expansion is: 
ecto. 1 1.3 1.3.5 
-1..1...----αδι.. 8... ς΄ μὰ. 
ΝἹ Ἐπεὶ εἰω το τς ς 6” 34.6.8" + 
=1+ $0 - -ἔχ5. τ2.5- γϑιχλις,, (-l<a<l). 
(—2)(—2)(—3)...(— +3) 
n! 
1.3.5...(2n-1) 
2.4.6...2n 


(iii) m= -- ξ : general term = an 


=(-1)" 


and the expansion is: 

1 1.3 1.3.5 1.3.ὅ.7 

—_ -1.--2 ee tae 4 
digg ὦ 24 8 


=1—ga Ἐξ — yeu? +ygst4—... (—-1<e<]) 


A Binomial Series provides a rational approximation to the power 
(1+2)" for a given 2 (—1<a#<1). The last case, for example, gives 
for x=0-1: 


J1+1 
= 1-00-05 + 0-00375 — 0-00031 + 0-00003 — ... 
=0-9535 to four decimal places. 


11.8. Properties of expansions. Given f(x) with derivatives of all 
orders, Taylor’s Series provides the expansion of f(x) as a power series 


of the form > f=, for | x |<r, where the radius of convergence r 


has to be found by consideration of the limit of the remainder R,, as 
n->co. The converse situation is: given a power series Da,x" and 
knowing its radius of convergence r, what is the sum f(x)? If f(x) can 
be found, then f(x)=a,2" for |x|<r. The coefficients a, are 


(n) 
identified: ας, ce 9) . In practice, it is by no means always possible, 


given the convergent power series a,x", to find its sum f(x) in 


8] EXPANSIONS 929 


terms of known functions. We know f(x) exists, for the sum of a 
convergent power series must be a function of x. If we cannot find it, 
we may suspect that we have a new function, something outside the 
established range of functions. This is, in fact, the situation for the 
power series of the examples of 11.6. Their sums are not algebraic 
functions; they are new functions to be explored in Chapter 12. 

One fact, implicitly assumed above, does serve to simplify matters: 
if f(z) = da,2" can be written for certain coefficients a, a, dg, ..., 
then this expansion of f(x) is unique. Ss ΤΕΣ = 30,2": 
Then: 


f (L) = + Oy% + Ugh? +... + Oy” +2" (Ang 1% + Ant δ +...) 
ie. f(x) =Gy +442 +022 +... +0,x"+e(x)x" where «(x)>0asx—0. 
Similarly: 
f (x) =by + δια + baz? +... +b," +7 (x)a" where ἡ (x) >0asx—0. 
These relations are true for any integral n. Hence: 
(αφ — by) + (44 — By) + (ας — By)U? +... + (Gn — Dn)u” + {e (x) -- (x) }x" =0 


Let x->0: A,—b9=9 1.6. Ay=Dp. 
Then: 
(a, —0,) + (ας -- δ4)α +... + (in — b,)a"—-1 + {e (x) -- (x) ja" 1 =0. 
Let x0: a,—b,=0 1.6. a,=6, 
and so on. Generally a, =D,,. Q.E.D. 


Within its radius of consequence, a power series >a,,2" is absolutely 
convergent. It is a property of absolutely convergent series, wv, and 
Συ,, that they can be multiplied in just the same way as polynomials: 


(μι + Ug +Ug +...) (0, τ, +034.) 
= UV, + (UyVg + Ug) + (UyVg + WyVq + Ug) + ... 


The proof of this result, which is tricky if not difficult, is given in 
15.6. It follows that two power series, Ya,x" and >b,y", can be so 
multiplied for all x within both radii of convergence. An example 
illustrates: 


(i) f(v)= τι ay 


51 5] oa +5 it: absolutely convergent, all x. 


So: f(yy=l+y4+ > y’ +] y’ +3 ae ν᾽ jt: absolutely convergent, all y. 


2! 3! 


924 EXPANSIONS [1] 
Hence 
x" ye νυἱ 
= 1 
St (x) x f(y) (14245 ἘΞΤῈ ..)( ty te ΣΉΝ : 
a+ 2eyt+y? 43+ δ ἕν + 3xry2 +43 
aC 31 τ 


(x+y)? (x+y) 
at 3f τ 


=1l+(a+y)+ 


Le. f(x) xf(y)=f(w+y). 
As a last problem to explore, suppose f(x) =xa,2", absolutely 
convergent within the radius of convergence r, and ask the questions: 


can we write { f(x) dx as the sum of the series obtained by integrating 


xa,x" term by term, and can we write f’ (x) as the sum of the series" 
obtained by writing derivatives of Σὰ," term by term? In short, can 
we write the eee 


2 ees 
[1 ae Hay αν ταῦτ + . +A,—~ we .. (+constant) 


7  (@) =, + 2agx + Bage? +... 4+ NA X14 00. 
This is a much more troublesome problem than might appear at first 
sight. It can only be approached by writing f(x) = a,x + ,, where 
R,, is the remainder in Taylor’s cas τὰ such that R, —0 as n> 00 


for |x|<r. Then {7 (x) dx= Lan tf ,, dx. The desired result 


follows, provided that [2 dx—>0 as n-> 00. This would seem to be in 


order, since integrals are ‘well-behaved’, provided only that the 
functions concerned are continuous. It can, indeed, be established as 


n 
correct. On the other hand, in writing Γ΄ (2) -- sa,x*- + By'(x), 
s=1 


where £,,’ (x) is the derivative of R,, as a function of xz, we must ex- 
pect trouble. We can never be sure of being able to write derivatives 
even for continuous functions. If we can, we must not expect that 
R,, (w)—>0 simply because R,,—>0 as n> 00. We give this problem up; 
derivation term by term of a power series is not a safe process. 

The first problem, that of integration term by term, is safe enough; 


8] EXPANSIONS 325 


but the result is not easy to prove. We may argue: for any x within 
the interval [—r, 7], R,—>0 as n-> oo so that, given «, we can pick V 
sufficiently large to ensure that | R, | <e for n>N. By the Mean 
Value Theorem for integrals (11.1): 


b 
[:1.40--,]} (ὁ -- α) Ξ:ε(Ὁ --αὐ forn>N 


a 


| b 

for any a and ὦ of [ —r, r] and for some α (a<« <b). Hence, | R,, dx 
a 

can be made as small as we please, by choice of ε and N, i.e. 


ὃ 

[:1: dz—>0 as n->0o for any a and ὦ of [-1, 7]. Take b=2, so that 
δ 

| R,, da=| Ry dx—>-0 as n->oo for x within [--7, 7]. For such 2, 


R,,—0 and [2 dx—>0 as n—>0o. Hence: 


n+l 
f(z)=Sa,2" and | f (x) dx =Ya— i + constant 
. 


and integration term by term is a valid process. 

There is a difficulty here and it is a subtle one. The remainder R,, 
depends on x so that, in picking N for | R, |<e(n>N) we can only 
specify N in terms of the particular x we have in mind. Hence N 
depends on x and this throws out the subsequent line of argument as 
given. The difficulty disappears if N is independent of 2, so that 
| RB, |<e for n>WN and for all x within [ —r, r]. In this case, the power 
series is said to be uniformly convergent, i.e. absolutely convergent at 
all x of [—r, r] in such a way that the choice of ε and N do not 
involve x. The result we need is that, within the radius of convergence, 
a power series is uniformly as well as absolutely convergent. The 
result is correct and the proof, which needs some care, is given in 
15.6. Accepting it, we can proceed to integrate power series term by 
term. An example illustrates: 


(i1) a —a+an?—a3+...4(-1)* 1-14... (-Ll<a<l). 
So: 


dx x2 χϑ a4 
+ constant =x —-—+-— ----- 


a” 
eee —] n—1 __ eee -} ] ° 
l+2z 2 3 4 ge ἀν nN a =e) 


920 EXPANSIONS [ΠῚ 


Take the integral over [0,2] so that the integral and series both 
vanish when x=0 and the constant is zero: 


= dt 5 δ ae 
a ee Sie ee att eee —] πεῖς eee -] ΜΗ ] . 
i og ει πο es ἈΞ ES 


We start with a geometric series (a case of the Binomial Series of 
11.7); we obtain a series examined in example (iii)(a) of 11.6. Both 
are absolutely convergent for —1<x<1. The sum of the first series 


is . The sum of the second series is not known; if we write it 


1 
l+2 
= dt : : : 
f(x), then f(x) -Ξ | Iai and we do have a little extra information. 
0 


μη 


is not an algebraic function. 


Ls x dt 
In fact, though ττε 8 algebraic, { at 


It is one of the new functions to be introduced. 


11.9. Exercises 

1. If f(x) --αϑ — 32% 235 -Ε1, show that f(z) =1 at x=0, 1, 2 and only at 
these values. Use Rolle’s Theorem to show that f’(«) =0 for some α (0<a<1) 
and that f’ (8) =0 for some B(1<f< 2). Write f’ (x) and deduce that a =1 —3,/3 
and that B =1+4,/3. Show also that f’ (x) =0 only at these two values. 

2. For f(x) of Ex. 1, use the Mean Value theorem to show that 73 (α) =3 for 
some a(1<«<3), and use f’(x) to show that «=1+4./3. Is there any other 
value of zx such that f’ (x) = 

3. Use (1) and (2) of 11.1 to show that V1 +2” =1 +440 -- ξὼ3 approximately 

3 
(x small), the remainder being R =____————- for some real @ (0<@<1). In 
16V(1 +6x)5 
Taylor’s Theorem, take f(x) =./2 and show that 
h h3 hs 
Waahe a = ff 


Hence check the first result. 
4. Taylor's Theorem: alternative form of remainder. Write F (x) as in the proof 


of Taylor’s Theorem (11.1), with F’(z)= eNO = ~—_“"_in+1)(¢). Show that 
b(b — x)" ὃ 
I; n! SOT (2) dae = -[ Ῥω) ἀπ - ' (α) Put b=a+h and write «=a+th 
(0 <t<1) to deduce that 
fla +h) =f(a) +f" (a) +f" (a) +... + f(a) +R, 


Antt f1 | 
where R,, ee [a —t)"f+1) (aq + th) dt. 


97 EXPANSIONS 327 


Use the Mean Value Theorem for integrals (11.1) to show that 
prt 
R,, -- - θ).5 Ὁ} (α -Θ) (0<0<1) 
and obtain the remainder form (3) of 11.1 in the case a=0, ἢ ξξα;. 


1 
5. Apply the Mean Value Theorem for integrals to show that [is dx =a? 


for some a(0<a<1). Evaluate [a dx and show that « =" : 

6. The sign of the derivative. If f’(x)>0 over the interval [a, δ], show that 
f(a) increases continuously over the interval. If ὁ --  [(α) and ἃ =f(b), show also 
that f-(«) increases continuously over [c, d] with range [a, δ]. 

7. Draw a graph of y =z? -- 32?+5 and locate its single maximum and its 
single minimum point. Similarly, from a graph of y Ξε -- 3x + 3x +5, show 
that this function has only one stationary value, a point of inflexion. 

8. Show that y -- δα -- 32° has a single maximum value (at x=1), a single 
minimum value (at x = -- 1), and in between a third stationary value which is 
a point of inflexion (at x =0). 

9. Maximum profits. If the revenue R (2) of a firm and its cost of production 
C(x) both depend on its output x, show that profits are ἃ maximum at output 
« if R’(«)=C’(«). Interpret as: marginal revenue =marginal cost. Write a 
sufficient condition for maximum profits. Re-work 10.9 Ex. 16 on this basis. 

10. The maximum and minimum values of a variable z =a? + y? are sought, 
where x and y take all real values subject to 4. + 3y =5. Eliminate y, obtaining 
z as a function of x, and show that z=1 is the minimum value and that z has 
no other extreme value. | 

ἘΠῚ. Functions of two variables. The real variable z is a function of u, where 
wu is the number pair (x, y), ὦ and y real variables. Interpret: given x and y, 
each from the domain of all real numbers, then z has a unique value f(x, y). 
Show that the function is represented graphically by a surface in three 
dimensions, referred to axes Oxyz, and interpret as a mapping of points in the 
plane Oxy onto points on the surface, or onto points on the line Oz. Examine 
the shape of the surface z=2* Ἐν", for all real x and y, showing that cross 
sections perpendicular to Oz are all circles. 

*12, Partial derivatives. For 2=f(x,y), if Lim τυ εξ " IY) oxists, 
call its value the partial derivative of z with respect to x and write it as 


0 : i ‘ : 
Zn ΞΞ [,΄ (Ὁ, ν), OF as = sar f(x,y). Define the other partial derivative (with 


oz 
respect to y). Interpret = >0 at (x, y), and similarly by > 0. Show that necessary 
conditions that z has a local maximum or minimum, for variation of both 


variables around (x, y), are that 2 -ῷ =0. Ilustrate with z =z? + y?, showing 
that there is only one extreme value (a minimum at x =0, y =0). 


*13. Constrained maximum and minimum values. Extreme values of 


928 EXPANSIONS [1] 


z=f(x,y) are sought subject to the constraint, the side relation (x, y) =0. 
One method of treatment is that of Ex. 10. Develop the following alternative, 
called the method of the Lagrange multiplier after Lagrange (1736-1813). 
Write w=f(«, y) -Ad (2, y) for any multiple X. The unconstrained extreme 


. OW ew 
value of w corresponds to the constrained extreme value of z. Write a 0 


— — 


oy 


and show that a necessary condition for the constrained extreme value sought 
is the equality of the ratios Fe’ (α, Y) : by’ (α, y) and Fy (ὦ, Y) : by’ (2, y), each 
being A, to be taken in conjunction with ¢(x, y) =0. 

*14. Apply the necessary condition of Ex. 13 to the problem of Ex. 10, 
obtaining the same constrained minimum of z. Interpret graphically the maxi- 
mum (or minimum) of z=f(x, y) subject to a linear restraint as the local 
highest (or lowest) point on a section of the surface z= F(x, y) by a vertical 
plane (parallel to Oz). 

*15. Show that the problem of determining the rectangle of greatest area 
which can be cut from a circular piece of cardboard of unit radius is equivalent 
to finding the maximum of z=4zy relative to x? +y?-1=0 (x>0, y>0) 
Show that the (maximum) rectangle is a square of side ,/2. 

*16. Show that the same x and y give an extreme value of u = (5, y) relative 
to ¢(z, y) =constant and an extreme value of v =¢(x, y) relative to f(x, y) = 
constant. Why would you expect a maximum of u to correspond to a minimum 
of v? Express the problem of Ex. 15 in alternative ways. 

*17. Jownt production. A firm produces outputs x and y of two commodities, 
and its given resources limit ἃ; and y according to a specific relation 4 (x, y) =0 
(x>0, y>0). The firm can sell its outputs at given prices, p and q respectively. 


Show that a necessary condition for maximum revenue is: ee - fy (ὦ, ¥) (ὦ, y) 
Pp 
and interpret in marginal terms. If p(x, y) =z? +y2-1, 


and if p:q=4:3, show that maximum revenue is 
obtained by the outputs ($, 3) given by the point P of 
Fig. 11.9, where one of the lines 4x + 3y =constant 
touches the circle centre O and of unit radius. Relate this 
solution to that of Ex. 10, taking note of the result of 
Ex. 16. 


18. Infinite integrals. Show that a double limiting 


Fig. 11.9 


οΌ 
process is Involved in writing Ἰ I(x) ἄχ, i.e. first the 


limit as n—oo of the sum of the areas of n rectangles (similar to the sum of 
an infinite series) and then the limit as the rectangles are refined (more and 
thinner rectangles). 


19. Integral Test for convergence: if f (x) is positive, continuous and de- 
οΌ 
creasing for all x >1, and if Ϊ f(x) dx=Lisa convergent infinite integral, then 


2uU, where u,=f(n) is convergent to sum S <u,+L. To prove, write 


9] EXPANSIONS 329 


n+1 | n+1 
Un =Un 9; (ἀν and express »v, -|" {f(n) —f(«)} dx=f(n) —f(a) for 
n<a<n+1 by the Mean Value Theorem for integrals. Hence show that 
0<v, <f(n)-f(n+1) and that Xv, is convergent with sum <w,. Finally 
Συ, = Ln -|* f(x) dx and Lu, is convergent with sum <u,+JZ. 


20. By the comparison theorem (11.4) with v,, = 1/n, show that >)(1/n”) is not 
convergent if 0<p <1, and by the μὰ test (Ex. 19) show that 5)(1/n”) is 


convergent if p>1. Deduce that 1+ 5-5 δ - a+ teat gto is convergent, and that 
1 : ΓΟΥ͂Ν +... is conditionall t 
αν 5, τα" nditionally convergent. 


21. From d’Alembert’s test in limit form, establish that Σ γῆ, for positive 

k, is convergent 0<7r<1 and not convergent r>1. Deduce that 
1 +/2(8) Ὁ ν8(4)5 +... 
is convergent. 

22. Speed of convergence. Given that the common logarithm logy, 2 is the 
sum of the series 0-4343(1 —4+4+4-23+...), it would appear that more than 
200 terms are needed to give log,, 2 to two decimal asad (0- Sats Increase the 
speed of convergence by writing the terms in pairs: 1 --ῶ 22 1-i=54,.... 
Show that the nth pair is 1/[2n(2n —1)] and that about 20 pairs are needed to 
get log, 2=0-30. Examine 7 =4(1 -$+3-7 +...) similarly. 

23. Alternative series for log, 2 are 0°:4343 times either: 

26 τὸ) Ἐξ ()" ἘΣ (2)5-Ὁ...} or 80 4+3(3)? Ἐξ (4) +7 (5) Ἑ...} 
Show that these converge so rapidly that log,) 2 =0-30 (to two decimal places) 
is got by taking 5 or 6 terms of the first and only two terms of the second series. 
How many terms for the third decimal place (logy. 2 =0-301)? 

24. Recurring decimals. A recurring decimal is an infinite geometric series, 
the sum being the neta se Illustrate with: 


_ 58, 1 te) eG 
0583 = 100 “am tpt Io 5.) ΤΩ 
re 18 
and ἀν 61. tat .)Ξ π᾿ 


25. Write 


ata 1 1. )- 
ὃς {1 Ἐτ ττρε τ.) =! 


and deduce that a terminating decimal can be shown as a recurring decimal, 
e.g. 0-64 =0-639. See also 2.9 Ex. 1. 

#96. Real numbers and decimals. Illustrate and prove the following series of 
properties relating non-terminating decimals 0-b,), ... ὃ, ... to real numbers 
a, 0<a<1: (i) any such decimal corresponds to one real number and no two 
decimals correspond to the same real number; (ii) any recurring decimal 
corresponds to a rational number and conversely; (ili) any non-recurring 
decimal corresponds to an irrational number and conversely. 


990 EXPANSIONS [11 


] ; : 
*27. The irrational 6. Define e=1+1+— 51 ΓΒ] x τ +... and suppose ὁ is rational, 


1.6. 6 - where p and q are positive integers. If S, is sum of (ᾳ + 1) terms of the 


series, show that 


(-) (0) tgataa tai Gh) tt 
τ “@+i@+2) Ὁ ἢ "ed |g 
Further, show that ( -8,) (a!) =positive integer ἐν . Deduce that e is 


irrational. 
28. Show that x2 +425 +4254+..., absolutely convergent for |x|<1, is not 
even conditionally convergent when «=+1. 
*29. Hypergeometric series : 
{2B α(α Ἐὴ BB+1) , , α(α Ἐ1)(α +2) β(β Ε1)(β +2) 
ly 1.2 y(y+1) 1.2.8 y(yt+l)(y +2) 
Write the term in 2”. If «, β and y are any constants, other than negative 
integers, show that the series is a power series absolutely convergent for 
|  |}<1. What happens to the series if «, 8 or y is a negative integer? 


w+... e 


1 
30. If 0<2< 1, write as @ series in ascending even powers of 2. 
= g p 
| ] ; 
Hence, show that = 1-00504 to five decimal places, checking that onl 
N0:99 " 


three terms are needed. 
31. In the product f(x) x f(y) of example (i) of 11.8, show that the (n +1)th 
term is: 
— {2" +na™—ly er mays 5 ee © ney} + yr} 
and identify as — Ἴων +y)" by the Binomial Theorem. 


32. Integrate the expansions of sp arc and of τς : , term by term to get 


i dt _. tat ΟΝ Ἴ 

ie. Ὃϑ 8. ὃ 7 

( ie ee ae (-l<a#<1). 
a δ Τ᾽ 


33. If f(x) Ξ 1 peste te ., show that [F@) dx =f(x) +constant, i.e. 


f(x) is such that it is its own derivative. (It is the exponential function of 12.2 
below.) 


CHAPTER 12 


ELEMENTARY FUNCTIONS 


12.1. Defining new functions. The specific functions so far considered 
are all of algebraic form, obtained from a real variable x and 
appropriate constants by the algebraic processes of addition, sub- 
traction, multiplication, division and root extraction. They include 
polynomials and ratios of polynomials, the basic element being the 
power x" for integral n. They include various expressions involving 
powers such as «7 = 1.2 where r is a rational number p/q (q>0). It is 
now time to extend the variety of specific functions. 

As an illustration, consider example (ii) of 11.8. The function 
y=1/1+<2 is of simple algebraic type, continuous and decreasing 
over the domain z> — 1. The graph of Fig. 12.1 shows a curve pa 
smoothly from left to right. The function has an integral: f(x) = [τῷ Ἐπ 
where the lower end of the interval of 
integration is taken so that f(0)=0. 
For given x, f(x) is represented by the 
shaded area of Fig. 12.1, the area under 
the curve y = 1/(1 +) from 0 to z. In the 
definition of an integral, we carry out 
an algebraic process: summing areas of 
rectangles and proceeding to a limit. -Ί 
We might well expect that the result, Fic. 12.1 
the integral f(x), is of algebraic form. 

Looking at the problem from another angle, we write the function 
1/1+2 as the sum of the power series 1 -x+2?-a3+... for |” | <1 
and we get: 


= = x 
- ——-—+... with = 0. 
f(z)= ere x tig git with f(0)=0 


If x is small, say small aos for χὰ to be neglected in an approxi- 
mation, then 1/1+2=1—2+2?—2? and f(x) =x —3a?+ 32° approxi- 


992 ELEMENTARY FUNCTIONS [12 


mately, as cubic polynomials. The exact representation of each is 
obtained by summing 7 terms of the series and by proceeding to the 
limit. Again, since we start from an algebraic expression 1/(1 +2), we 
might well expect an algebraic result to emerge for f(x). 

The surprising fact we now have to face is that our expectation is 
not justified. The limiting process, to get an integral or the sum of an 
infinite series, takes us outside the range of algebraic expressions. It 
produces a function, to be written log (] -- “), of entirely new and 
non-algebraic type. | | 

There are two ways of defining new functions. One is as the 
indefinite integral of a known function of z. If the result is not recog- 
nised as a function of known type, then it can be written as defining 
a new function. The other way is as the sum of an infinite power series 
in ascending powers of x. This sum is again a function of x and, if it 
is apparently not of known type, then it can be defined as a new 
function. The two methods usually link up, i.e. a function defined 
as an integral can be expanded as a power series, and conversely. 
Clearly we have a choice: we can use one method as the definition of 
a new function, and the other method provides a property of the 


function. For example, we may write log (1+2) =| = ; a8 a defini- 
0 
SC χα 
tion and get log (l+2)=x- ota | +... a8 a property; or we may 


proceed conversely. In either case, we may make a mistake in failing 
to recognise the ‘new’ function as an ‘old’ one. As far as we know at 
the outset, what we have written as log (1+) may turn out to be 
such an algebraic expression as /(1 +2) —1//(1+2) —23/24. Here it 
is not, but it is always a possibility.* But there’s no harm done. If 
we start by writing log (1+) and then find it is really a known 
function, we can just scrub out the log (1+) notation, or indeed 
we may opt to carry it as a convenient short-hand for the known 
form. 

The choice of which method to adopt as a definition is not so much 
a question of logic, for each method can be made equally strict. It is 
more a matter of getting a neat and economical definition from which 
all desired properties flow easily. The balance lies in favour of using 


* The algebraic expression given and log (1+ %) have the same power series up to 
and including the term in x* but not beyond. 


1] ELEMENTARY FUNCTIONS 333. 


an integral as a definition. For, nothing more is needed as a basis 
than the Fundamental Theorem of the Calculus; if the original 
function is continuous, the integral exists, is continuous and has a 
known derivative (the original function). All the new functions of the 


. dt 
present chapter are found to stem from just two integrals: { and 
1 


. at 
[ ἘΞ The first gives the exponential, logarithmic, power and 
3 


hyperbolic functions ; the second leads to the circular or trigonometric 
functions and their inverses.* An outline of this development is given 
in 15.7 and 15.8. 

Against this, the definition of functions as power series requires a 
more elaborate foundation: the theory of convergence of power series. 
It is much less neat and economical. In the present development, 
however, we have already invested a good deal in the construction 
of the necessary theory (Chapter 11). Further, we find that the ex- 
pansions of the new functions are of central importance in their 
manipulation. Consequently, at the cost of a certain mathematical 
elegance, we follow the method of defining functions as power series. 

For the most part we find we can get by with the series: 


r2 3 
1 ΧΦ Έ ΘΙ tajit-- 
together with the similar series: 
| 22 gi gf oa a 
l-sita alt and ae ΠΕ πἰ aa 


All are absolutely convergent for all x. A basic constant, due to 
Euler (1707-83), is a particular case (4 Ξ- 1) of the first series. It is 
denoted e: 


11 
a EY Ga | Ἐ...- Ξ2]71828... : 


We shall also make use of the series: 


χϑ αὐ ga? 
ᾳ--ς--Ἐἰ:τ τ t+... 
3°67 


* Other new functions can be defined by integrals, e.g. the B and I functions of 
12.9 Ex. 28 and 29. More generally, new functions can be defined by differential 
equations, e.g. Bessel functions of 14.9 Ex. 5 below. Many of the new functions are 
cases of a wide class of functions F(x; a, 8, y) in three parameters, derived from the 


994 ELEMENTARY FUNCTIONS [12 


This is absolutely convergent for -- ἰ «Ὁ -1 and conditionally con- 
vergent for x= +1 (11.6 above). A constant given as a particular 
case (7 ΞΞ 1) is: 

w=4(1-344-44...)=3-14159.... 


This turns out to be the constant 7 of Archimedes (circa 250 B.c.), 
familiar in elementary geometry. 

One further step can be taken in this process of unification. Once 
we admit complex numbers as exponents in the power function, and 
as the variable in a power series, we can eliminate even the distinc- 
tions we have just made. Powers of the constant e with rational, real 
or complex exponents pull together the whole lot. The range of 
functions discussed here is unified by means of the exponential 
function 65, a function of most remarkable flexibility and power. 
There is even a link between the basic constant e=2-71828... and 
the other constant 7 =3-14159... . The relation which turns this trick 
is: e€?" -- 1 This development is perhaps the most amazing, the most 
exciting, of all mathematics. 


2 3 
12.2. The exponential function. The power series 1 +2 +5 +35 +... 
is absolutely (and uniformly) convergent for all real x and its sum is 


a function, the exponential function :* 

DEFINITION: The exponential function exp x, read ‘exponential x’, 
Sa 
21 81 
The basic property is obtained by multiplication of series: 


is: expx=1l+a4+ +... defined for all real x. 


exp x xexp y=exp (x+y) for all real x and y............ (1) 


This is established in example (i) of 11.8, using the result that 
absolutely convergent series can be multiplied. From the definition: 
exp 0=1. Hence: 


exp x x exp (-2)=exp (x -x)=exp 0=1 


hypergeometric series (12.9 Ex. 31). The range of non-algebraic functions used in 
applied mathematics (e.g. by the physicist or the engineer) is quite extensive; a de- 
tailed analysis of them is given by E. T. Whittaker and G. N. Watson: Modern 
Analysis (Cambridge University Press, 1902). 

* ‘Exponential’ is an adjective formed from ‘exponent’, which derives from the 
Latin: ex =out of, and pono, ponere =to place. 


2] ELEMENTARY FUNCTIONS 335 


So: exp (-2z)= 
τὰ i ἕΕ[8Οὀυιο[ τὰ (2) 
; expr ee Ε 
and: ἘΞ. =exp x x exp (—y)=exp (x-y) 
Further: (exp x)? =exp x x exp x=exp 2x 


as another case of the basic result (1). This line can be developed, 
exactly as in Appendix A. 1, to give any rational power of exp x: 


(exp x)"=exp (rz) for rational 7...................0. (3) 


The power series for exp x can be integrated term by term to give the 
integral of exp x, including an additive and arbitrary constant: Ὁ 


2 3 
[ exp de=constant+ [1 de+ {x de+ [5 45. [5 ἄχ... 


5. δ! 
ΞῚΈΧΈΘΙ tai Ἐ 5 =expz 
where the constant is put equal to 1 and the arbitrary constant 
dropped for convenience. Inversely: D(exp x)=exp x. Hence the 
standard forms: 


D(expx)=expx and { exp UAL=OXP Δ  rasesveseess (4) 


These could not be simpler; all derivatives and integrals of exp x 
are exp x. They can be extended to y=exp u where wu=/f(x). The 
rules for derivatives of composite functions (10.3) and integration 
by substitution (10.7) give: 


D,y=D,y ιν. and | yu’ dx= ly du 


i.e. D exp f(x) =exp f(x)f' (x) and fexp Sf (x)f' (x) da =exp f(x)...(5) 


These two results are seen, directly, to be each the inverse process to 


the other. As a particular case of (5), take f(x)=kax, where k is a 
constant and f’ (x) =k: 


: 
k 


Apart from exp 0=1, the other special value needed is exp 1. 


D exp (kx)=k exp (kx) and | exp (kx) dv =— exp (kx) ...... (6) 


336 ELEMENTARY FUNCTIONS [12 


This value is written as the constant e, an irrational number (see 
11.9 Ex. 27): 

ΝΟΤΑΤΙΟΝ: e=1+1 toptgyt~ = 2°71828.... 

This constant turns out to be, not only an irrational, but also a 
transcendental number. It is not the root of any polynomial equation 
with rational coefficients. A rational approximation, to any desired 
number of decimal places, is obtained by adding a sufficient number 
of terms of the series, as in 11.5 above. The approximation written 
here is to five decimal places. 

From (3), it follows that (exp 1)"=expr for any rational 7, 1.e. 
e'—expr, where 6 is a power with a rational exponent in the 
elementary sense of Appendix A.1. If x is not rational, e* has no 
meaning as yet. We are, therefore, free to define it in any way we 
find convenient. It is now clear what we have to do, i.e. we write 
e*—exp x as a matter of notation: 

x 


2 3 
Notation: the xth power of e=e*=exp στ Ἐα Ἐπ Ἐπ for 


any real x. 

If x happens to be rational, then e* is the ordinary power of elementary 
algebra, e.g. e®@=exexe and e?=//e. If x is not rational, we have 
simply opted to take e* as exp x. The reason is seen when the pro- 
perties (1) and (2) are translated: 


ee end a Me ial 0 aka al | Ad la ee νοις ορφυ νον (7) 


In other words, e* satisfies the familiar properties of powers, now for 
a any real value and not only for x rational. 

We now have two alternative notations exp x and e* for the same 
thing. We do not need to keep both of them. We can drop exp x and 
stick to e*, which is clearly more convenient when multiplying and 
dividing as in (7), as compared with (1) and (2). The standard forms 
(4) also translate easily: 


Det =e and fe dae = e® 


and similarly for (5) and (6). However, it is convenient to maintain 
both notations, for the ‘exp’ form is much easier to write when the 
exponential of a complicated expression is taken. So exp f(x) can be 


2, 8] ELEMENTARY FUNCTIONS 337 


used as well as, or instead of, οἴ, For example, in statistics, the 
normal distribution is: 
y =ye7Ne-/07] (a and o parameters) 


which can be written more easily as y=y, exp { - τω πὸ : 

12.3. The logarithmic function. The exponential function e* is 
continuous, with derivatives and integrals of all orders. If x is 
rational, then e* is a power of a positive constant e>1 in the elemen- 
tary sense of Appendix A.1. Hence, e*>0 all rational x, and e*—> 00 
as x—>00, e*—>0 as x-> — 00, through rational values. By continuity, 
the same results hold for all real x. The 
graph of y=e* is the familiar growth 
curve shown in Fig. 12.3. The function 
and curve are increasing for all 2, since 
De® =e*>0 all x. 

As a continuous and increasing func- 
tion, y=e* has an inverse which is also 
continuous and imcreasing. Moreover, 
both the function and its inverse have 
derivatives and integrals of all orders. 
The inverse could be described as the 
‘inverse exponential’ function and de- 
noted by ‘exp-!’. It is, however, so im- 
portant in its own right that it is given 
a separate name: the _ logarithmic 
function,* denoted by ‘log’. Hence, if 
y=e*, then x=log y; equally, if «=e, 
then y=log =. 


DEFINITION: The logarithmic function Fia. 12.3 
y=log x, read ‘logarithm of x’, is the 
unverse of the exponential function: 
uf «=e, then y=logx defined for all x>0. 
The graph of y =log x is simply that of x =e". Hence, given the graph 
of y=e*, interchange of axes produces the graph of y=log z, as in 
Fig. 12.3. Two particular values are to be noted: 
log 1=0 and loge=1l. 


* ‘Logarithm’ is derived from the Greek : logos =reckoning, and arithmos =number. 


998 ELEMENTARY FUNCTIONS [12 


Hence, as an increasing function, y=log z is negative for 0 «ὦ «1 
and positive for x>1. It is not defined for «<0. 

Properties of logarithms are obtained as direct reflections of 
properties of exponentials. Let u=log x and v=log y, so that x=e" 
and y =e’. Then: 

xy =—e%xe*=eXt? je. log ry=u+v=log x +log y. 

Similar results for 1/2 and for x/y are obtained. These results are of 
the greatest importance, both in theory and practice: 
log xy =log x +log y; log (1/x)= —log x; log (x/y) =log 2 — logy...(1) 
for any positive real values of x and y. Two other results, of a partial 
kind and subject to later extension (12.4 below), can be usefully 
set out. From (1), log z?=log «+log x=2 log x; this extends to 
log z*=n log x. Further, if w=Tlog (e*), then e“=e* and w=z, 
i.e. log (e*) =x. So: 

log a" =n log x (n positive integer) and log e*=z@............ (2) 

The derivative of log x is obtained from that of e* by the rule for 

inverse functions (10.3). If y=log x, then x=e” and: 
D,y =1/D,x=1/D,e" = Ὶ [ἐν = 1a. 
Hence the standard form and its extension by the composite function 
rule: | 
_1 6) 
D log aa forx>0 and Dog f(x)= Flay for f(x)>0...... (3) 
It is not profitable at this stage to seek the integral of log x, though 
the integral exists. On the other hand, the inverse of (3) is the 
standard form: 


An arbitrary constant is to be added to (4). Alternatively, a lower 
end of the interval of integration can be specified. Since log 1=0, 
we have: 


[: : dt=logx (x#>0). 
1 


This form of (4) is the appropriate (alternative) definition of log x 
as an integral. If this definition is adopted, the exponential function 


9] ELEMENTARY FUNCTIONS 339 


comes as the inverse of the logarithmic function, and not (as here) 
the other way round. The same integral gives a new form (and 


é 
alternative definition) for e: [ “ =log e=1. It is also to be noticed 
1 


that (4) completes the standard form already given (10.7): 


d ttl 1 
fe aes (rA~A-1). 


The derivative f’ (x) measures the rate of change of f(x), from the 
definition in 10.2. The units of measurement are so much of f(x) per 
unit of x, e.g. velocity as the rate of change of distance over time 
can be measured in miles per hour or in feet per second. Consider the 
related concept of the proportionate rate of change, i.e. the rate of 
change of f(z) as a proportion or percentage of f(x) itself: 


DEFINITION: The proportionate rate of change of y =f (x) is: 


ιν. aml +h) - fle) 


Y 1] nso hf (x) 


] 
Another notation often used is — dy = Lim ay 


de Ae where Ax and Ay 
Ax->0 


are corresponding increments in x and y. Standard form (3) gives: 
Proportionate rate of change DY =Diogy (y>0). 


Hence, the rate of change of y is given by the derivative of y and the 
proportionate rate of change by the derivative of log y. 

The units used for measuring y appear in the rate of change but 
are eliminated in the proportionate rate of change. For example, if 

: 3 1 2 

ψτ-εϑα3 where x is time in years, then Dy =4”=16 and a 
at 2=4, i.e. y is then increasing by 16 units per year and by 50 per 
cent per year. 

For the exponential function y=y,e"*, where yy is positive (the 
value at x=0) and where ¢ is a positive constant: : 


log y=log ypt+rx and ; Dy =D log y=r. 


The function y=y,e"* has the property that y grows at a constant 

proportionate rate r, i.e. steadily at 100r per cent per unit of x. The 

converse is also true (12.9 Ex. 9). For this reason, the exponential 
M A.B.M. 


940 ELEMENTARY FUNCTIONS [12 


function has a wide range of applications, in such problems as com- 
pound interest and population growth (12.9 Ex. 13 and 15). 

It remains to write the expansion of the logarithmic function. This 
is done, not for logz, but for log(1+z). The reason is that 
log (l+az)=0 at x=0 but logz is not defined at x=0. Hence 
log (1+) is defined in the interval --ὶ «ὦ “1 around x=0. To 
expand log (1+), we repeat what we said in 12.1: 


Wjltev=l—-a+a?—-a3+... -~l<ar<l 
= dt ἀπ θυ ὧδ 
] =] --Ξῷ ae et et... «60 cer ......... 
and log (1+2) [τῷ τω τ «1-1 (5) 


The expansion (5) is absolutely convergent in the interval shown; 
it is also conditionally convergent at x=1, as found in 11.6 above. 
However, the series (5) is not convergent at x= — 1, a reflection of the 
fact that log 0 is not defined. 

The logarithmic function can be used to derive an important limit: 


THEOREM: e*” =Lim(1 +5) " for any real x and integral n. 


Proof: in D log f(x)=f'(x)/f(x), write f(~)=1+ ua, f’(x)=u, where 

u is any given real value. So: 
u 

eee and [» log (1 tue)| τῶ 


By the definition of the derivative (at 7=0): 


~ log (1+uh) —log 1 
h 


D log (1+4ux)= 


=; log (l+uh)>u as h-0. 
Write h = and let n—> οὐ through integral values: 


Lim ἡ log (1 +5) =U. 


no 
By (2) above: Lim log (1 +3) =log e. 
Since the logarithmic function is continuous, this implies that: 
Lim(1 15) =e 
n—>00 n 


which is the result to be proved, on switching from utoz. Q.E.D. 


3, 4] ELEMENTARY FUNCTIONS 341 


Hence, the exponential function is the limit of an algebraic form: 


Lim(1+=)" = and Lim(1-=) aad are re ent eee (6) 
nN nN 


n—->@® n-—-> 


ἜΤ ' I\" : 
As a particular case, (6) gives e as a limit: Lim(1 +3] =e, Again 


n—>@® 


we see how a non-algebraic function can be got by first writing an 
algebraic expression and then by proceeding to a limit. 


12.4. Power functions. The notation e* for the zth power of 6 is most 
useful. It agrees with the ordinary concept of a power when the 
exponent is rational and it satisfies the basic property of powers 
(e* x ey =e*+¥) for all real exponents. This is the aspect of the ex- 
ponential function now to be developed. 

Consider αὐ where a and 6 are any real numbers. So far as we have 
defined this expression only in two cases: when 6 is rational and a a 
real number; when a =e and ὦ a real number. It can now be extended 
to all cases, provided only that a>0. If a is real and positive, log a 
is defined and so is οὐ 1084 for any real ὃ. This expression provides the 
definition of the power a’: 


DEFINITION: The power a? for any real a>0 and any real ὃ is: 
αὖ — οὗ loo a. 
Since log ἴτε, so log a®=log (6 108 α) τεῦ loga. The definition, 
therefore, is equivalent to the property: 
Ἰοῦ ἡ 0 loom Gt wince epetaseati setae θα (1) 
The two results of 12.3 are included in and so extended by (1): 
log a"=n log a (n positive integer) and log e*=z. 


The second follows from log e=1. Further the basic properties of 
powers extend at once to a’. For: 


at x a¥ —e% 10g α x ev log a — ety) log a —gtty, 
Similarly: (ab)* =a*b*. Hence for a>0 and b>0 and real x and y: 
a®xav¥=atty and (ab)*=a"b® ........ eee (2) 
Particular cases of (2) are: a*/a¥=a*-¥ and l/a*=a-*. 


An extension of the concept of a logarithm is implied by the 
development of powers of e into powers of a>0. If x=a¥ =evloga 


942 ELEMENTARY FUNCTIONS [12 
log x 
log a 
notation, this inverse is written log, x, read ‘logarithm of x to the 
base a’: 


is the inverse of x =a’. As a matter of 


then log x =y log a, i.e. y= 


log x 


and a is called the base 
log a 


Notation: 7} «=a", then y=log, x= 


of the logarithm log, x. 

As a check, write a=e. So, if =e", then y=log, 278 =log x. 
There is agreement; log x is simply log, x and the basic concept of a 
logarithm (12.3) is that of a logarithm to base e. We continue to use 
log x instead of log, x, suppressing the base only when it is e. 


Hence, for logarithms to any positive base a, the basic property is: 


which simply shows how any logarithm log, 2 to base a can be 
written in terms of logarithms to base e. Indeed, (3) shows that the 
switch from log x to log, x or conversely is no more than a change of 
unit: 

] 


lor :) logz and log x= (log a) log, x. 


log, x= ( 


Logarithms to base e are all multiplied by the constant (icra) to 


get corresponding logarithms to base a, and similarly for the reverse 
switch. As a result, all the properties of logarithms, (1) of 12.3 and 
(1) above, carry over: 


] x 
log, xy =log, x +log, y ; log, = log, x ; log, > log, x — log, " (4) 
and. log, “χ᾽ Ξε log, x 

Logarithms to base e are called natural or Naperian logarithms after 
Napier (1550-1617); they are used here unless the contrary is 

specified. The logarithms used in arithmetical work are to the con- 
venient base 10 and they are called common logarithms. These are a 
re-scaling of natural logarithms: 


1 
logy) 2: = (ioe i6) log x =0-43429 ... log x. 


4] ELEMENTARY FUNCTIONS 343 


Tables are available for common and natural logarithms, giving 
logy) « and log x for various values of x. It is not the purpose of the 
present text to give an account of the use of logarithms in practical 
computations. What has been done is to fit the ordinary concept of 
logarithms into the strict development of exponential and logarithmic 
functions. The use of y=log x, where x and y are any real numbers 
(c>0) and not only rationals, is now justified. Indeed it is only in 
the present context that we can justify the everyday use of 
logarithms. 

Two power functions are obtained from the definition of a’, 
according to which of the pair of real numbers a and ὃ is given and 
which is the variable. The power function y=a*, for a real constant 
a>0, is defined on the domain of all real x. It is a development of the 
exponential function: 

y= a” — ex log α 
with a graph which is the same as that of y—e* except for a re- 
scaling of the variable x. For: a*’ =e* if x’=x log a. Properties (2) 
above are those to have in mind in handling this power function. 
The inverse function is y=log, x, with a graph which comes from 
that of y=log x by a re-scaling of the variable y. For: if y'=log, x 


and y=log x, then y’= (4) y by property (3). Properties (4) are 


relevant for y=log, x. 
Derivatives of αὐ and log, x come from the relationship of these 
functions to the exponential and logarithmic functions: 


Dat = D (65 108 2) = (log a)e* 108 « =a* log a 


1 
D log, na log x= 


lo x log αὐ 
Also: [o d= | er 8 de = δ τ oe ΝῊ 
loga loga 


The expansions are to be obtained likewise: 


= oe 896 a)" s 


αὐ =e* log 4—1 + (log a)x +—=—— a 


es+... (all x) 


2 3 4 
log, (1+2)=-8 (115). Ὁ [- ca lo 


loga ἷΙἸορα 


944 ELEMENTARY FUNCTIONS [12 


The other power function is y=? for a real constant a, defined on 
the domain of real x>0. This is also a modification of the exponential : 


¥ =x* =e log « 


1.6, it is an exponential function giving y in terms of log x. Over the 
domain z>0, y=a* is continuous and increasing; its inverse is 
y=«'/*, another of the same group of functions. For example, y=2? 
and y=,/x are two particular cases, one the inverse of the other 
(x>0). When a is irrational, the function χα is not algebraic. When a 
is rational, we arrive back at an algebraic function χα. Derivatives 
and integrals follow (12.9 Ex. 17): 
got 

D (a*)=axz-1 and fe ey (aX -- 1). 
So does the expansion of (] - 2): 
τοι 1) 2 α(α -- 1) (α -- 2) 

ἀμ. 
an extension of the Binomial Series of 11.7 above. 

Two very useful limits can be established: 


(1+2%)*=1+ax 4+——— e+... (--1] «ὦ «Ἰ} 


γῆ 
THEOREM: τς and ae both tend to zero as x-> οὐ, for a constant a>0. 


Proof: Take z>0 and n any ee oe so that 


an 


ὄπα σεις τε “+: oO 


χα nize *n! 


— = a>0O). 
and 0 oe ae ee (a>0) 
I 

Choose n>a so that nara as X—> 00. 
ue 
Hence: | ma asx—>co (a>0). 


b 
Now write: ᾿ “0 as y->0o (6:»0). 


Put y=log x and a l/a. Then =e’ as yoo 


(log x)1/* 
Μη 


and —-0 as ζ- σοῦ (a>0) 


1/a)\a 
i.e. {eer >0 as x>o (a>0) 


4, δ] ELEMENTARY FUNCTIONS 345 


1.6. = 7-0 as 20 (a>0). | Q.E.D. 

Combining the two results, we conclude that log x; x*; e* are in 
ascending order of magnitude (for large 2), all increasing indefinitely 
with x. In other words, x* tends to infinity faster than log x, and e* 
tends to infinity faster than x* or log x, as 2- οὐ. Hence, the ratio of 


ze 
a later to an earlier member of the ordered set of three, e.g. <> oO 


as x->00; the ratio of an earlier to a later member, e.g. ἘΞ τὸ as 
5-- οὐ. The graphs of y=log x, y=? and y =e? illustrate this impor- 
tant relationship between the functions (Fig. 12.4). The relative 
positions are preserved if (e.g.) x? or /2 
is substituted for x. 


12.5. Circular functions. The power series 
oF gt as EP αὐ ἃ 

I Fone an gi . and x — ΓΕ τ΄" 

are absolutely (and uniformly) convergent 

for a real x. As functions of z, the sums of 


these series define the circular functions. 


Derinition: The circular functions cos x ΕΠΟΣ ΣΙ 
and sin x are defined for all real x: 
χὸ χὰ 76 es gd αἵ 
cos ὦ -- 1 -- τ ἐατ- alt: 5 ON α τε -- 881 ΤΥ 


where cos x 18 read ‘cosine x and sin x 1s read ‘sine x’. 


These functions are connected one with the other.* Being absolutely 
convergent for all x, the series can be multiplied to give: 


sin 2; sin y= (2-5 +5) (y-4 +9 -...] 


_ αν xy αὐ xy xy 
ΞΕ Χ -- (: ἀπ). (Sar 4 - 


2ry 4x°y + 4ary* Ἢ θαυῦν + 20a3y? + Bary? _ 


~ 9! 4! 6! 


* ‘Sine’ is derived from the Latin: sinus =curve. ‘Cosine’ is the complementary 
term: cos 2 equals sin y where x and y are complementary angles (adding to a right 
angle), see Fig. 12.7¢ below. 


946 ELEMENTARY FUNCTIONS [12 


x2 a4 y? a 
and: cos # eos y= (1-F +5; —...) (1- oat” in) 


aii ith) ΠΕΣ χη ᾿ oo . me 
φῦ rn xt + Gury? +y* 28+ 15aty? + 15x%y4 + Ys 


Sa ge Ὁ 6! 
Put x=y and write sin? x for (sin x)*, cos? x for (cos x)?: 
ge 22 a 2 
2 ne ae, ae 2¢=1 -—277?4— ---- χορ, 
sin? 7 =2 3 χε and cos? x w+ 45 Ὁ 
=] -sin? 2. 


Hence the first basic property of circular functions: 
BUN COR OSE aie ευνροτίν οὐδ ς χύῖρον (1) 


We do not, strictly, need both functions, since one is given in terms 
of the other: sin x= + /(1 — cos? x), cos x= + /(1 —sin? x). But these 
are awkward relations and it is convenient to carry both functions, 
with the relation (1) between them. 

Further, subtracting the series for sin x sin y and cos x cos y, we 
have: 
ΟΟΒ % ΟΟΒ y — Sin x sin Ψ 


x* + αν + y? τ at + ἄχϑῳυ + θα + dary? + yt 


a 2! 4! 


_ eS + Gaby + Ldxty? + 202% y8 + 152244 + Bry > + Fo. 
6! 
., (στ) wt (ὦ ΕΣ | 


=cos (x+y). 
In a similar way (12.9 Ex. 19 and 20): 
sin x cos y+ COs & Sin y=sin (w@+y). 


Hence, the second basic property of circular functions gives addition 
formulae: 


cos (x + y) =cos x cos y —sin z sin ψὴ 


sin (x+y) =sin ἃ cos y+ cos x sin y 


δ] ELEMENTARY FUNCTIONS 347 


From the definition, cos x=1 and sinx=0 at x=0. Further: 


sin 2 ge gk: ee 
es πεῖ at as x—>0. 
So: cos 0=1; sin 0=0; cee ΤΠ (3) 


The power series for cos ὦ and sin x can be integrated term by 
term: 


l i 1 
[Ὁ x dz —constant+ | 1 dx ail" dx +a |* dx -ξτ' dz+... 


xe αὐ αἵ 
πα ΤΕ] π|7 (constant = Ο) 
=—sin x 


and 
s 1 1 1 
sin 2 dx =constant + dt — τι x? dat + αὐ da — τι x? dx+... 


a? 24 76 8 
= ht ate! al (constant = — 1) 


= — COS & 


where, apart from introducing particular constants, the arbitrary 
constants are dropped for convenience (as usual with indefinite 
integrals). By reversing the integration process in f sin x dx = — cos 2, 
we get the derivative: D cos a= -- βίῃ x. Similarly: D sin x=cos a. 
Hence, the standard forms: | 


D cos x= -sinz; Dsinx=cosz 
: Pe eNO TCE: Serer errr © 4 
fcos2dx=sin x; fsinzdx= —cosx (4) 
One further derivative is needed: 
sin z\ cosx D sin x —sin x D cos x cos’ + gin? x 
cos z/ cos? x gos? a 
. sin x gin xv\2 
1.6. D a a eS ee re rere rere (5) 
COS αὶ COS x | 


As a particular case of the geometric series, and of the Binomial 
Series of 11.7, the following power series is absolutely convergent 
for -—l<@#<l: 

| 11 Ἐα5-Ξ] --,,Ξι κ3ϑ--αϑ τ... 


M2 A.B.M. 


948 ELEMENTARY FUNCTIONS [12 


On integrating term by term: 


dx a le dn 
l+a? >" 3°65 7 

which is absolutely convergent for —1<a#<1 and conditionally 

convergent also for x=+1 (see 11.6). The appropriate range of 

integration here is from 0 to x, so that the integral becomes zero at 


1+? 

0 

is the sum of the power series above for —1<a#<1. A new function 
is so defined, called the ‘inverse tangent’ and denoted ‘tan-!’. The 
reason for this odd notation appears later. 


x=0. Hence, consider | : ας which is defined for all real x and which 


DEFINITION: The inverse tangent function tan x is: 


e dt 
-1 ieee 
tan-1 x =|" TH defined for all real x 


=€-—+—-T+... for -l<x<il. 


From the definition as an integral, the standard form follows: 
Dhan ef ee™ νου ον τος κουντυςνον ρκενο (6) 


Since tan-! 0=0 and D tan-!2>0, all x, y=tan~! z is a continuous 
and increasing function of x such that tan-!2<0 for x<0 and 
tan-! x>0 for x>0. 

The inverse tangent function, continuous and increasing, has an 
inverse which is also continuous and increasing and which is called 
the tangent function,* denoted ‘tan’. Hence, if y=tan-12, then 
x=tan y; equally, if x=tan-!y, then y=tan x. The domain of the 
tangent function is the range of the inverse tangent function, to be 
determined but known to be some interval around x =0. 


Drrinition: The tangent function y=tan x is the inverse of the 
unverse tangent: if x=tan— y, then y=tan x for some interval around 
x=0. 


The interval is the range of tan-! y as y takes all real values. 
* The label is the same as for the ‘tangent’ to a curve, derived from the Latin: 


tango, tangere =to touch. The slope of the tangent to a curve is tan a, where « is the 
angle the tangent makes with the horizontal. 


5] ELEMENTARY FUNCTIONS 349 


The derivative of tan x follows from the inverse function rule 
(10.3). If y=tan 2, so that x=tan- y and D,x=1/1+y? by (6), then: 


Dy =1/Daz=1+y?. 
Again write tan? x for (tan x)?. Hence, the standard form: 
D tan-® = 1 -rtan® eas ectssedewsivarkecesas (7) 


The properties of y =tan 2 are similar to those of the inverse tangent. 
Since tan 0=0 and D tan x>0, y =tan wz is continuous and increasing 
(over its domain) such that tan x <0 for <0 and tan x>0 for 7>0. 

Now, check the derivative (7) against the derivative (5): tan x, as 
now defined, turns out to be the ratio of sin x to cos x, as previously 
defined. In this case, therefore, we have not got a new function at all; 
it is expressed in terms of functions already known. We do not need 
a new notation. Having defined sin x and cos x, then tan x a 
and tan~ 2 is the inverse. But, just as we carry both cos z and sin 2, 
so we opt to use tan z subject to: 


CAN SSI Y/ COS του οιο νους φενὀυνορυν ον (8) 


There are three circular functions (cos x, sin x, tan x) related by (1) 
and (8). The inverse, tan-!z, is given by the integral and series 
written. 

In the inverse tangent y=tan-! 2, consider the particular value 
taken by y when x=1, ie. the sum of the series 1-3+3$-7+.... 
As a matter of notation: 


go 1+2” 
π--4(.-8.8- 8.6... 314169... 


; 1 dz 
Notation: iv=tan-11= {| —— , so that: 


This notation, in effect, defines the constant 7. It remains to identify 
it with the familiar π᾿ of elementary geometry (12.7 below). For the 
moment, we simply have: 


tan-tl=w/4 and tan (7/4)=1. ......... ee. (9) 
The results (3) and (9) summarise the particular values we know so 
far: 
cos x=1, sinzx=0, tanzx=0 atz=0 


sin z/cos x =tan x=1 at r=n/4., 


350 ELEMENTARY FUNCTIONS [12 


Further values, and the graphs of the circular functions, are left over 
until the relationship with the trigonometric ratios is established 
(12.7). 


12.6. Complex exponents. The exponential 6", r rational, is an ordinary 
power of the constant 6 (e.g. 65 =e xe xe and et=,/e). The extension 
of 6" to e*, when x is a real variable, is achieved by the device of 
taking e* as the sum of a power series. Here e* has no meaning apart 
from the power series (except when x happens to be rational) but it 
does satisfy the essential rule: e* x ev =e*+¥, It would seem natural 
to make a further extension from e*.to e*, where z is a complex 
number x+iy. The extension, in fact, can be achieved without 
introducing any really new ideas and it is a very important one. Our 
aim is to write: 


where z=x +1y is a variable complex number and where Z=X +iY 
is the complex value obtained from z by taking the exponential e?. 
There are two matters to consider in an appreciation of (1). 

One point is that Z=e? is a particular case of a function of a 
complex variable Z = F(z), as developed as a ‘conformal transforma- 
tion’ in 7.6. Such a function involves the double process of ‘equating 
real and imaginary parts’. Specifically, when the operation repre- 
sented by F(z) is performed on z=2-+iy, a complex value emerges 
which depends on x and y: 


B(2)=4 (a, y) + tp (a, y) 


where the form of F determines what form the expressions 4 and ip 
take’ This is Z2=X+7i¥ so that X=4(z, y) and Y=d(z, y) is the 
conformal transformation. The function F(z) implies a pair of 
(conformal) relations, one from the equation of ‘real’ parts, the other 
from the ‘imaginary’ parts. | 

The other point concerns a power series in a complex variable, of 
the kind written in (1). There is no difficulty here. Suppose Yu, and 
XY, are two (convergent) series and write w, =u, +iv, a8 a complex 
number. Then the sum Zw, of the series of complex terms w, simply 
stands for Su, +7dv,: 


6] ELEMENTARY FUNCTIONS | 351 


NOTATION: tf W,=Un tip, Write Sw, =U, tte, and call Yw,, the 
sum of the series of complex terms Wy. 
Again, in Sw,, the ‘real’ and ‘imaginary’ parts are added separately. 
The definition of absolute convergence (11.5) extends: Yw, is 
absolutely convergent if the series of real positive terms > | w, | is 
convergent, where w, =U, +iv, and | w, |=./(w2+v2). The essential 
result is: 


THEOREM: Sw, ΞΕ Σ (U,+%v,) 1s absolutely convergent if and only uf 
du, and Sv, are both absolutely convergent. 
Proof: If Sw, is absolutely convergent, then > | w, | is convergent. 
But | u, |<,/(u2+v2)=|w, | and similarly for |v, |. Hence both 
>| u, | and > |v, | are convergent, 1.6. both Yu, and Yv, are abso- 
lutely convergent. Conversely, if Sw, and dv, are absolutely con- 
vergent, then >| u, | and >|, | are convergent series of positive 
terms, and so is Σ (| τὸ, | -+| Yn |). Since | w, | =./ (u2+02)<| Ua |+| on |; 
Σ | w, | is convergent and Yw, is absolutely convergent. Q.E.D. 

Consequently, as long as Sw, and Xv, are absolutely convergent, 
we can deal with the absolutely convergent series Xw, of complex 
terms w,, =U, +iv, In particular, such series can be multiplied since 
the property of 11.8 continues to hold. All this applies to a power 
series Da,z" where ζ τεῦ τῆν. The general term: 

AZ" =Un+iv, Where u, and v, depend on z and y. 

If Su, and dv, are absolutely convergent (real) series, then La,2" 1s 
absolutely convergent. In this case: 


Z=DA,2" =D (Un + Wn) 
means Z=X+1Y where X =du, and Y= d1,. 


Absolutely convergent power series can always be multiplied to- 
gether. 


: Eg 7 
The power series 1 + 2+ 51 81 18 absolutely convergent for all 
: 2” | 2 |" ποτ, 
z. To prove: write Wn =" 80 that | w, agg where | z |=V2?+y? 
Ww Ζ 
and — eA, as n—>oo for all z. 
we!) nn 


Hence, by d’Alembert’s Test of 11.4, © | w, | is convergent, and the 
power series (by the definition) is absolutely convergent for all z. 


352 ELEMENTARY FUNCTIONS [12 


As a matter of notation, the sum of the power series is a function 
of a complex variable written e?: 


Novation: e?=1+z2+— rT ἜΣ 5] ΕΝ ., absolutely convergent for any 


complex z=x + iy. 
This is an extension of e*, and e* reduces to e* when y=0. The basic 
property is obtained by multiplication exactly as in 12.2: 

ex xe’2—e1t2 § for all z; and 259. ........ceceeeeee: (2) 


As a case of (2), take z, =x and z,=iy (a and y real): e* x οἷν =ettiv, 
Hence, if z= +iy is any complex number, 65 is the product of a real 
factor e* and a complex value οἷν: 

CP OO + cadre ease le ies aeuredavictanie% (3) 
To complete the form (3), an explicit expression of e+” as a complex 
number is required. From the power series: 


civ =1 4 (iy) + a (02 -Ξ- -1) 


3! . 
" ψ y4 ψ νψ 
=(1-0 48 -. .) +i( ΠΕ] 
1.6. οἷν -- ΟΟΒ Ψ -Ἐ ὃ Β1 YY ....ἁὁννννννν νον ον νυ ννννονον (4) 


by the power series for the circular functions. Putting (4) into (3): 

C7 =C7(COS YAUSIN Y) .««.ονννννννννννννννννον (5) 
So, if Z=e* and Z=X+iY, then X =e* cosy and Y =e sin y. The 
expression (5) is all we need; the conformal transformation Z =e? is 
very simple: 

X=e*cosy and Y=e*sin y. 
Notice that the definition of cos x and sin x gives: 
cos(—2)=coszx and sin(-—2)=-—sinzg 

which is sometimes summarised by saying that cos x is an ‘even’ 
function and that sin x is an ‘odd’ function. Hence (4) gives: 


cosx+7sinz=e* and cosx—isinw=e-* ......... (6) 
Add (6) to give: 2 cos x =e + e-@ 
and subtract: 22 sin «= οἷ — -ατ ἴα 


a ; : 1. ͵ 
So: cos 2 =5 (et Ὁ 6- and sin 2=5,(el# ~e-M) νον, νι, (Ἱ) 


6, 7] ELEMENTARY FUNCTIONS 353 


The results (7) are remarkable, and most useful. The expressions on 
the right-hand sides are complex values; but in each case the 
‘imaginary’ part disappears and the expressions reduce to real values 
(cos x and sin x respectively). They also suggest a further notation, 
in real values only: | 
ΝΟΤΑΤΊΟΝ: cosh x=3(e*+e-*) and sinh x=3(e* —e-*). 


Cosh x and sinh x are called hyperbolic functions. There is nothing 
new about them; they are merely convenient notations for the 
exponential expressions shown. 


12.7. Trigonometric functions. The circular functions, as defined 
above, have nothing whatever to do with trigonometry. For any real 
value x, cos x and sin x are obtained as the sums of particular series 
and tan x is sin 2/cos x. It is not assumed that x is the measure of an 
angle; it is not assumed that cos x, sin x and tan z are trigonometric 
ratios between the sides of a right-angled triangle. 

We now proceed to establish that they can be interpreted in these 
ways. First, what do we mean by the measure of an angle? We may 
try to pass the buck by saying that the measure is such that, if an 
arc AP of a circle (centre O, radius r) subtends an angle 6 at O, the 
length of the arc AP is ré and the area of the sector 40} is $770. 
But the pass must be refused. The question remains: what do we 
mean by the length of an arc or the area of a sector? Nothing is 
defined; we need to remedy the defect. We choose to do so by 
applying the general concept of an area (10.4) to a circle.* At the 
same time, we introduce the trigonometric 
ratios and link them with the circular 
functions. 

To put the problem specifically: in Fig. 
12.7a, let A be a given point and P a vari- 
able point on a circle, centre O and radius 
r. A measure α is to be assigned to the angle 
AOP in such a way that the area of the sector 
AOP is ir*x. Further, with this measure ὦ, 
and with tan x as defined in 12.5, tan x is to 


Fig. 12.74 


* The length of an arc of a curve can also be defined in terms of integrals, and it can 
then be applied to a circle. This is more difficult than for areas and we are well advised 
to stick to areas here. 


354 ELEMENTARY FUNCTIONS [12 


be shown equal to the trigonometric ratio of MP to OM, where PM 
is perpendicular to OA. This is a formidable assignment. 

Insert axes Ox and Oy as shown in Fig. 12.7a and take (for the 
moment) the point P (uw, v) in the positive quadrant, 2 AOP between 
zero and a right angle. Write t= P/OM, the ratio with which we 
are most concerned. Any point P (u,v) on the quarter circle from 


A (r, 0) to B (0, r) satisfies τ + v? =r? (u>0, v>0). Now t= so that 
u?+yu2—r? (u>0, t>0). Hence: 
uw=r]/(l+t) and v=rt//(1+t) (é>0). 
It follows that the point (x, y), where «=———— and eee, 
Bee JI +72 ee re 


describes the arc AP from S to P as τ increases from O to ἐ. The area 
PMA under the circle from UM to A is given by: 


ἡ θ. ἃς {rt - τ 
dx = τ ἀτ- - --------- - ----- - 4 
[.» [9 dr oV14+72 J(1 +77)3 , 
yf ¢ dr 8 é ute dr 
= 9 (1 aa rie (1 +72)? 


“Lae * Janes], 


on integrating by parts. Hence the area PMA is: 


1 rt 
2 —] 
27 τ inh + ἀν tan~! f. 


The area of the triangle OPM is 4uv = 27 me 
AOP is the sum of the area PMA and the triangle OPM. Hence: 


S=$r? tan-! ¢. 


We now have our answers. As a definition, take the measure of the 
angle AOP as x=tan-1t and call the unit radians. Here the variable 
t is the trigonometric ratio of MP to OM. Hence t=tan x, so that 
tan x (x measure of the angle in radians) is the trigonometric ratio f. 
Further, the area of the sector AOP is ir2z. 


DrEFIniTIon: The measure of the angle AOP in radians is x =tan-! t 
where t is the trigonometric ratio MP/OM. 


7] ELEMENTARY FUNCTIONS 355 


This is for ¢>0, ie. P in the 
positive quadrant. The two ex- 
treme values, at A and B, have 
t=0 and ἐ-- οὐ respectively. 

The constant 7 can now be intro- 
duced: tan j7=1 by (9) of 12.5, 
ie. 7/4 is the angle for which 
t=MP/OM =1. The triangle OM P 
is then isosceles, half a square, 1.e. 
the angle 7/4 18 half a right angle. 
So: 

Right angle = ἐπ radians 
and at A (t=0) the angle x=0, at 
B (t-> 00) the angle x = ἐπ radians.* 
Since the area of the sector AOP is Frc. 12.76 
4r2z, it follows that the area of the | 
quarter-circle (a Ξε ἔπ) is Ζγῶπ and the area of the circle is zr’. 

If ¢ is the trigonometric ratio M P/OM, then the function t=tan x 
increases over the range t>0 as x increases over the domain 

| 0<e<4r. 
The domain of {=tan x can be extended, by letting P swing round 
the circle, anti-clockwise as in Fig. 12.76, running through four right 
angles. Then: 


Sign of Sign of t=tan x 
Angle x OM MP Sign Range 
0<a<5 + + + 0 to οὦ 
5 Ξε τ 2. ἢ Ῥ - — 00 to 0 
<a - - + 0 to oo 
or <a<2m | + = = - 0 to 0 
The graph of ¢=tan x, over the domain 0<27<2z, is as shown. For 
larger x, as P swings around the circle for a second, third, ... time, 


* To put into degrees (the elementary angle measure), we re-scale so that the right 


18 
angle is 90°: ἐπ radians = 90°. An angle of x radians gene mn". 
WT 


356 ELEMENTARY FUNCTIONS [12 


the values of =tan x repeat themselves. Similarly, t=tan x repeats 
to the left, for negative values of x, P swinging around the circle in 
the clockwise direction. 

Finally, having got tan x as the trigonometric ratio MP/OM (in 
Fig. 12.7a), we proceed to interpret cosz and sinz similarly. If 
t=tan x, then 


sinz/cos~=t and sin?2+cos?7=1. 


Hence: sinx=tcosx and f?cos?2%+cos?¢#=1 
1.6. cosx=1//(1+é) and sin w=t//(1+#). 
Since jee Pee OTe 
OM’ OM? OM? 
1.6. cost=OM/OP and sinz=MP/OP 


which are the well-known trigonometric ratios. This is for 0- ὦ S 


but the extension to other x follows as for tan x. Referring to Fig. 
12.76, we see that, as x increases from 0 (at 44), cos x decreases from 
1 (x=0) to 0 (w= 7), to —1 (w=7); and then increases again to 


0 (« = 3) and back to 1 (7 Ξε 2π). This cycle is repeated in subsequent 


intervals 27n<a<4n, 4rn<a<6n7,... and similarly for x<0. The 
y graph of y=cos x is shown in 
{ y=cos x Fig. 12.7c. The graph of y=sin x 
is similar, but sinzx=0 at x=0 
and sinz=1 at x=47. Both 
0 functions are periodic, repeating 
27 Xx : ; 
themselves in periods of 27 for z. 
Consider now the representation 
- of a complex number z=2x+ ty as 
a point P (x,y) on an Argand 
Diagram, as in Fig. 2.5a above. 
Let OP=r and let the angle OP 
makes with Oz be @ radians. Then: 


xz/r=OM /OP =cos 0 
1.6. x=r cos 0 


and = y/r=MP/OP =sin 6 


1.6. y=r sin @. 


7] ELEMENTARY FUNCTIONS 357 


Hence: z=x£+iy=r(cos θ- sin 8). | 
Further: a? + y? = 1? (cos? 0 +sin? θ) =r? 
and a : =tan θ. 

x rsin 0 
Hence: r= /(x*+y?) and @=tan-!y/x 


where r=| z | is the absolute value or modulus of the complex number 
z and where θ is its argument or amplitude (subject to the condition 
of Appendix A.9). 

The results (6) or (7) of 12.6 are useful in the further development 
of a complex number in terms of its absolute value r and argument 0: 


Z=97(COS O42 Sin O)=Te? ......ceccceseceeceenees (1) 
Multiplication is easy with (1). If z;=r,e*1 and z,=r,e%:: 
{= 242%, = 761 x r,e%2 =r, γ, οἰ 1 +d) — re? 


where r τε γα is the absolute value and 6=0,+ 0, is the argument of 
z. In a product of complex numbers, absolute values are multiplied 
and arguments added. 

The remarkable relations between the basic constants e and 7 can 
now be given. Put θ -- ἐπ in (1), noting that cos ἐπ =0 and sin 37 =1: 


beta ats 9π ; 
ei7/2—7, Similar results follow when @=7, το ὃπ are substituted: 
ering ett = 1. ORFS 45 Ohta ia νυν νϑεύρνοονος (2) 


The relations (2) can be interpreted in a variety of ways. They express 
the fact that ‘multiplication by 7’ means ‘rotation through a right- 
angle’ on an Argand Diagram.* They represent in turn the fixed 
points B, A’, B’ and A on the Diagram (Fig. 2.5a). For, B is (0, 1) 
in Cartesian co-ordinates, i.e. z=0+11=7; the absolute value of z 
is r=1 and argument 0 =3n, ie. z=1 xe”? =e, The others follow 
similarly. Finally, the set {¢, —1, --ὖ, 1} form a cyclic group, and so 
does the set 
{ein/2, ent, ρϑέπ!δ ρϑπέν, 


The fact that this is a cyclic group follows from the relation: e?"*=1. 


* Multiply z=rei® by i=eizl?: zx ¢=retSein/? —ret(9+a/?), The complex number z is 
rotated through a right angle on an Argand Diagram. 


908 ELEMENTARY FUNCTIONS [12 


12.8. Summary of results 
Properties : 
e71 x 652 = et1tia 
log xy =log x + log y 
sin? x + cos? a Ξ 1 


cos (x+y) 


= ΟΟΒ x cos y — sin x sin ¥ 


cos +72 sin x —e*” 


COS 4: = α (e!* + e- #2) 


a* —e% loga 
log 2° =6 log x 
tan 2 =sin 2/cos x 


sin (x+y) 
=sin x cos y +cos x sin y 


cos x —2 sin x =e7-** 


: Ls 
ee te _ pte 
sin =>, (¢ 6 


z=2+1y=r(cos θ-Ἐ ὦ sin θ) =re® 


where r= Jz? + y? (absolute value), θ -- ἰδη-1 y/x (argument) 


δ᾽ =], 
Expansions : 
2 x23 
e* exp ©=1+%+—4+—+4... 
21 8! 
log (1 Se a ΡΟΝ Ὁ ΣΥΝΕ 
5 (1 -χσ)τα a +3 ΤΥ 
α(α -- 
(Ι τα)ε τι αν 42) xe 
x2 ot 76 
C8 GE or tat at 
: x 5. Ὁ 
mee eg ay eg ὦ. 
xe gh gg? 
τ} τ =o ee 
88. 77 
Derivatives: 


De® =e"; Def®) i (ae? 


D log em; D log f (x) ad) 


f(x) 
D cos x= —sin x 
Dtanx2x=1+tan? x 


: -l<wv<l 


Da* =a* log a 
Dx*=ax2-1 


D sin x=cos x 
D tan-! «=1/1+42? 


8, 9] ELEMENTARY FUNCTIONS 359 
Integrals : 
a 
x — pus f(x) Ff — pf (x) x = 
fe dx er; fe Γ΄ (2) dx=e fe dx ioe 
dx = . 7 (2) " Ε ΒΗ αν! 
[Free [FE dx =log f(x) fe dt = (a4 -- 1) 


{cos x dx=sin x 


[sin 2 dx —Cos x 


dx 
iana νων δου 
]ηυῖ8: 

: x in x 
Lim(1 ἘΞ)" =e" Vine ἡ 
no n 20 «& 

. lo _ χα 
Li 2 =Lim —=0 (a>0) 

φ--» Το wv B—>@ 


12.9. Exercises 


1. Show that the derivative and integral of e-* are both —e-*. Write De}? 
and De-#* and deduce that fxet*” dx =e" and fxe-#*” dx = — e-**, 


2. Normal distribution. Show that y =y,e—?*" has a single extreme value, a 


maximum ¥ at «=0, and that ψ-»0 as x->400. Indicate the shape of the 


curve y τε et”, the basic form of the ‘normal distribution’ of statistics. 
3. By the standard form (3) of 12.3 show that D log e* =1 and deduce that 


log 67 =a. 


D log (@,2" + Qp_yu" Ἔν +4150 - ἀρ) = 


expressed by the reduction formula: 1 = 


4. Write D log (1+) and D log (1 +) and generalise to: 
na,x2"—1 + (n —1)a,_,v"-? +... +a, 
AyjL” + Ane 14...4+4,0 4+, 
5. Integrate by parts to derive: [χοῦ dx = —xe~* + fe-* da = —e-*(x +1). 


ie 2) 
Use xe~*—>+0 as x—>c@o to deduce that Ν xe—* ἄχ =1 is convergent. 
6. By the method and result of Ex. 5, show that J 5 -Ξ [χΧ 6.5 dx can be 


οΌ 
—x"e-* ἙΉΪ μα; and that Ν “Ἶο 5 dx 
ao οΌ 
is convergent. Deduce that | ἢ τοῦ dx ΠΝ αὐ Ἰρτῶ dx and that n factoria 


οΌ 
can be written as this infinite integral: n! -{" xre—* dar, 


960 ELEMENTARY FUNCTIONS [12 
7. Algebraic and non-algebraic functions. Establish the following: 


function | derivative antegral 
1+2 1 (1 +2)? 
τίς 1|- aaa log (1+2) ὦ» -ἢ 
5 | am | bel) en 
oi = ar tan~! x 
i a (I a # log (2) Ge! 


Hence illustrate the fact that derivatives of algebraic functions must be 
algebraic and that derivatives of non-algebraic functions may be algebraic. 
8. Check back, by writing derivatives, that: 

dx ae fe Svan if 

= =2log(Ja+Nx-1) (for #>1); =-2log(V-x2+N1-2) (for 
Va (a —1) 
1 

2<0), with ———— not defined for 0<x<1. Show that 

) a(x —1) 


dx 4 
—————— Ξε 2 tan™,/_—_. for θ.«χ-]. 
| Na(1 -- 2x) 1~z 
Draw graphs of y - τς and of y= to illustrate. 


Va (a — 1) να (] -- α) 


9. Given Du =D log y =r (constant), show that log y=ra + A (A constant) 


and that the only function to satisfy the given condition is the exponential 
y =ype"®. (Here y, =log A.) Can r be taken as negative as well as positive? 
510. A function f(x) satisfies the relation f(x) x f(y) =/(%+y) for all real x 
and y. Show that the exponential function y =y,e"” is one function which does. 
To prove the converse, that it is the only function which does, proceed as 
_f'(n) 
f(y) 


=constant (7), all z and 


follows. If f(x) satisfies the relation, show that f’(x) = 


fixed. Similarly, f’(y ao Z τ Deduce that a) Φ oe 


y; and so that f’ (x) =rf (x). Use Ex. 9 to derive the required result. 


where n=2+Y,Y 


11. From the definition of exp x and (1) of 12.2, show that =(exp h-1)—1 


as h->0 and that ;{exp (x +h) —-exp x}—-exp x as h--0. This establishes from 


first principles that exp x is continuous with D exp x =exp 2. 

12. Semi-logarithmic graphs. The rate of change f’(«) is the tangent slope of 
the graph of y=f(x) at the point where x=«. Show that the proportionate 
rate of change of f(x) at x =a is the tangent slope of the graph of u =log f(z) 


9] ELEMENTARY FUNCTIONS 361 


at the pot =a. The graph of u=log f(z) is called the semi-logarithmic 
graph of y =f (2); it plots log y rather than y against x. Show that the exponen- 
tial function y =y,e" is a line of slope r on a semi-logarithmic graph. 

13. Compound interest. Let £x be the amount of £a after t years when interest 
is compounded at 100r per cent per year. Show that x =a(1 +r)* for yearly 
compounding and #=a(1+4r)** for twice-yearly compounding. Generally, 


nt 
reckoning interest n times a year, show that x =a(1 +2) . Let noo and use 


(6) of 12.3 to show that the exponential 2 =ae"t represents growth when interest 
is compounded continuously at the rate of 100r per cent per year. 

14. An investment doubles in n years at interest compounded annually at 
p per cent per year. Show that n is a function of p given by (1 +P)" 9 Use 


100 
logarithm tables to show that n=14-2 approximately when p=5. Generally 


show that 
SBF 8B (YY 
nm 100 2\100/ "3\100/7 " 


Neglect p*, take the natural logarithm log 2 =0-7 approximately, and get the 


practical rule of thumb: n - approximately (e.g. n =? =14 for p=5). 


15. Population growth. Let x be the number of births at time t. Assume that 
x increases at the steady proportionate rate of 100r per cent per year and that 


t 
the life span is 50 years (everyone dies aged 50). Write ψ =| 507 dt for the 
| t 
total live population, and u =|" we dt for the number ever born, at time ft. 


Show that y and wu increase at 100r per cent per year and that 2 =] —e-s0r — 


constant over time. For a 2 per cent increase (r =0-02) show that the live 
population is always 63 per cent of the total population throughout time, i.e. 
that the present population outnumbers their ancestors. | 

16. Show that log,, (1 +2) =0-43429 ... (2 -- 3.3 +422 —424+...) for |x [-1 
and for x =1. Indicate how this gives a method of evaluating logiy ὦ, directly 
for 1 <a <2 and by use of (4) of 12.4 for other a. 


gati 
2* dx = 


17. Write De® log aoe log* and deduce Dx* =ax%-1 and ; 
a+ 
(a# -- 1). From successive derivatives of x* at x=1, obtain the expansion of 


(1 +2)* for real a as a Taylor’s series. 

18. Graph y =2' and y =e* — 1 for x>0, using the same axes and scales and 
illustrate that x* increases less quickly than e* as x—>oo0. Similarly illustrate 
the tendencies of γα and log 2 as x—>oo by a graph of y -ο νὰ and y=1+log x. 

27 
*19. Write the general term of the series for cos x as ( — 1 Poni and show that 


the general term of the product of the series for cos x and cos y is 


902 ELEMENTARY FUNCTIONS [12 


(- {am 2n(2n -- 1) pty 4 onan - 1)(2n -- 2)(2n -- 3) 
(2n)! 2! 4! 
Write the corresponding term in the product of sin # and sin y and deduce 
that the general term in the expansion of cos x cos y — sin & sin y 18 
Sat {νι + Qnw"—ly + on{e =) 1) 

Identify the terms in brackets as (x+y)*" and the expansion as that of 
cos (x+y). 

*20. Repeat the steps of Ex. 19 to show that the general term in the 
expansion of sin x cos y + cos x sin y is the same as that in sin (5 + y). 

21. Obtain fe” cos x dz =e" cos 7 + fe* sin x dx and a similar result for 
fe sin « dx by integration by parts. Deduce that 

[6 cos x dx =e" (sina +cosxz) and  fe* sin x dx =$e* (sin x — cos 2). 

22. By the method of Ex. 21, find Je~* cos x dx and fe-* sin x dz and show 


— ein—tyt +..0+ ty}. 


Ya ils | ριον ΞΕ 2nay2?-} + yt : 


that [pe cos x dx -|° ε΄ gin αὶ da =F 

593, In view of the fact that sin x is bounded (oscillatory with period 27), 
what can be said about the limits of sin x as x—>oo and of sin (4) as x—0? 
If f(x) =sin x7, show that it is fallacious to infer Lim f(x) from the limit of the 
sequence f(n) for n=1, 2, 3,.... (In this Seer =0 all n.) Consider the 


crescendo oscillation of sin (:) for small x and indicate the nature of the dis- 


continuity of sin () at x=0. 


24. Hyperbolic functions. From the definition (12.6), show that cosh z is an 
‘even’ function and sinh x an ‘odd’ function such that cosh? ὦ —sinh? x=1. 
Show that D cosh x=sinhz and Dsinhx=coshz; check that D tan“ “0) 
can be expressed as the reciprocal of 2 cosh x. 


25. Deduce from taniav=1 that cos 4m =sin ἐπ a: and check by the 


2 
trigonometric ratios of 45° (ia radians) in an isosceles right-angled triangle. 
26. A cube root of unity is w =}( —1-+7,/3). By the corresponding point on 


an Argand diagram (3.8 above), show that the argument of w is 7 radians. 
Deduce from (1) of 12.7 that cos ἐπ = --ἔῺ and sin = Me. 
27. From the addition formula for cos (x + y) show that 
cos 22 =2 cos? « —-1=1 -- 2 sin? x. 


2 : : 
Hence show that cos = = --ἔ gives cos $7 =} and sin ἐπ =NS. Check from the 


trigonometric ratios of 60° (ἐπ radians) in an equilateral triangle. 


1 
*28, Beta functions. Take the definition B(m, n) = ={ama(1 - -- α)5 1 dx for 


97 ELEMENTARY FUNCTIONS 962. 
any real m>0, n>0. Substitute 2=sin? 9 to transform B(m, n) into 
ἽΝ am—1 9 cos 2"—1 § d@ and deduce that B(%, $) -- π. By this and the other 
transform of B(m, n) given in 10.9 Ex. 27 show that a can be expressed as 
Ϊ Lt de Ϊ o 45 
ον --«Ἱ 990 (1 Ἐαὴνῳ ; 

Check the first from Ex. 8 above and the second by substituting x =? in the 
integral. 

*29. Gamma functions. Define I'(n) =|Pane dx for any real n>0. Show 


by the method of Ex. 6 above that I'(n)=(n -1)I'(n -- 1). If n is integral, 
deduce that I'(1)=1 and I'(n)=(n-1)! 


οΌ 
*30. Substitute x=}? in I'(n) and show that I(n) τοῦς i. g2n—le~t? de, 


1 
Given that the ‘normal distribution’ y = Set has unit area, check that 
7 


[ey dx =4, and that I'($) =\/7. 


*31. Hypergeometric series. Write F'(«, 8; y; x) for the sum of the hyper- 
geometric series of 11.9 Ex. 29, absolutely convergent for [ὦ |<1. Show that 
several familiar series are particular cases: Binomial (1 —x)-"™=F(m, 1; 1; x); 
Logarithmic log (1+2) =xF'(1, 1; 2; —x); Inverse tangent tan“! «=2F'(4, 1; 
$$; —2). 

*32. Write the hypergeometric series Κ᾽ (α, 1; 1; x/«) for [4] « α. Show that the 
exponential series for e” can be regarded as the limit as «—>-0o of F'(a, 1; 13 w/a). 


CHAPTER 13 
LINEAR ALGEBRA 


13.1. The basis of linear algebra. The concept of a vector space, 
though defined in purely algebraic terms, was used in Chapter 8 as a 
basis for geometric space and, in particular, for Euclidean space. 
The object now is to explore the purely algebraic properties and 
applications of vector spaces, the emphasis being on algebraic 
vectors. 

A set of vectors over a field of scalars is the foundation of linear 
algebra, a vast subject with many applications. The idea of ‘linearity’ 
is present in the definition of a vector space: a set V of entities called 
vectors, and subject to the operation of addition, together with an 
outside set /’ of scalars used in the operation of scalar multiplication. 
The result is that, given vectors vj, v2, ... of V and scalars aj, ἃς, ... 
of Ff’, then a,v,+4,¥,+... is another vector of V. Hence, sums and 
scalar products together serve to give the familiar ‘linear’ form 
2,0, +4.v,+... in a general setting: the algebraic sum of multiples of 
certain vectors. The following simple cases indicate how varied is the 
interpretation of the ‘linear’ form. 

(i) Take V as the field of real numbers and F as the field of rationals. 
A typical vector is the real variable x and a typical scalar is the 
rational multiple a. The linear form is a,7,+a,%,+..., also a real 
number of V. So real numbers form a vector space over the field of 
rationals. This vector space handles linear forms of real variables 
with rational coefficients. 

(ii) Take V as the set of real number pairs (7, y) and £ as the field 
of real numbers. Add pairs by the rule: 


(1, Y1) + (La, Yo) Ξε (δι + Xe, Yi + Με). 
Define scalar products by the rule: a(x, y)=(az, ay). V is then a 
vector space over / and it deals with linear combinations of the form: 
Ay (%y, Yy) + Ag(Lo, Yo) + 2 = (AyXy + Aga + +, AY + AeYot...). 


1] LINEAR ALGEBRA 365 


This is, in fact, the primitive notion of a vector as a point P, or a 
directed line OP, in a plane. For example, the mid-point of P,(x,, ψ4) 
and P,(%.5, ψ.) is the linear combination: 


δία, Y1) +3 (La, Yo) = {2 (W1 Ἐ 22), δ (Yr + Yo)}- 
Another interpretation of the same vector space is the field of 
complex numbers, z=(x, y)=x+1ty, over the field of real numbers. 
The linear form is then: 
αχῶχ + ρῶς + 00 = (AX ὥρας +...) +4 (GY +AYe +...) 
The representation of a complex number as a point on an Argand 
diagram (2.5) is the link between the two aspects of the vector space. 

(iii) Take V as the set of quadratic polynomials with rational (or 
real) coefficients and F as the field of rational (or real) scalars. The 
typical vector is then the triple (a, b, c) of elements of F or the 
expression ax?+ba+c in the undefined x. Adding and multiplying 
quadratics by scalar multiples in the ordinary way, we get V as a 
vector space. The linear combination of quadratics, with 4’s from 
F, is: 

A, (0,2? + δια τ οι) Ὁ λεία, + bo% +06.) +... 
= (λια, + Agile +...) + (λιδι -Ἑ λοῦς + ...)U + (Aye, +AQC, Ἔ ...). 

The development of linear algebra is on the following lines. One 
vector v of a vector space V is a linear combination of other vectors 
V1, Vo,... if it can be written: v=a,v,+a,.v,+... for some scalars 
1, A, .... The concept of a linear transformation, as in 7.5, can then 
be made perfectly general: a mapping of one vector space V into 
another vector space V’ such that a linear combination of vectors of 
V is mapped into the same linear combination of vectors of V’. 

The general vector space V over F can be specialised to a more 
practical form by taking vectors as n-tuples v=(%,, 2%, ...%,) of 
values from the field F (usually real numbers) which also provides 
the scalars for scalar products of the n-tuples. The space V,(/) of 
n-tuples is then obtained and Euclidean space H,,(/) is a particular 
case (8.4). The algebraic concept of a space of n-tuples is of very wide 
scope; example (ii) here is a case of V,(F) and example (iii) of 
V,(f). The basic result, proved in 15.9, is that, apart from vector 
spaces of special form (of ‘infinite’ dimension), any vector space V 
whatever is isomorphic with, and algebraically indistinguishable 
from, a space of n-tuples for some integral n, the dimension of V. 


966 LINEAR ALGEBRA [13 


The general concept of a linear transformation becomes specialised, 
and more familiar, when applied as a mapping of the space V,,(/’) of 
n-tuples into the space J ,,(F’) of m-tuples. If the n-tuple (21, 25, ... %) 
maps into the m-tuple (y,, Yo, --. Ym), then y, is a linear expression 
in the τ) real variables x, x, ... %,, and similarly for y,, y, ... Ym 
For example, as in 7.5, if n=m=2, then y,=a,,%,+a,.%, and 
Yo ΞΞ ας δι +Mo,%, 18 the linear transformation, completely described 
by the double array 
31 Ay2 
Gey Gee 


of scalar values. This is the concept of a matrix, of two rows and 
columns in the particular case n =m =2, and generally of m rows and 
n columns. The matrix notation is introduced initially to lighten the 
_ algebraic burden of linear transformations and equations; it is later 
found to have a great variety of other applications. 

Consider the problem of ‘inverting’ a linear transformation 7' from 
the space of n-tuples to that of m-tuples. Τ' appears as linear ex- 
pressions for 4, Yo, ... Ym in terms of the variables 2,, 2,, ... z,. Can 
we turn 7’ around so that it gives expressions for 2,, Xo, ...Z, in 
terms of the variables y,, ¥,. ... Ym? T is arranged to provide values 
of the y’s when values are assigned to the z’s; can 7' also turn the 
trick the other way? Exactly the same problem appears in another 
guise, that of the solution of linear equations. The linear transforma- 
tion T can be written as m linear expressions in 2,, 2, ... X, equated 
respectively to ¥;, Ys, --- Ym. Assign constant values b,, b,, ... Bm to 
the y’s. Then we obtain m linear equations in the x’s. Can we find the 
x’s, 1.6. solve the linear equations? We can, if we have already 
inverted 7’. For then the z’s are given in terms of the y’s, and assign- 
ing the particular values (the 6’s) to the y’s we have the x’s which 
solve the linear equations. For example, if »=m=2, then the 
problem of inverting the linear transformation: 


Y= αι, +Ay_X. ANd ψι5 Ξε ἀ5γῖ) + Ag—%p, 
is the same as that of solving the linear equations: 
By3Xy+Ayo%o=b, and 44%) + φρο τεῦ. 
Hence, we have the parallel problems of inverting a linear trans- 


formation and of solving a set of linear equations, both of them 
exercises in linear algebra. 


1, 2] LINEAR ALGEBRA 367 


Linear algebra deals with a great variety of other problems. For 
example, it provides the conditions under which the quadratic form: 


γαῖ + AgehS +... + 249% %+...>0 for all x, 2, ... 


In its turn, this has applications in the calculus, in the problem of 
determining the maximum or minimum values of a function of 
several variables, with or without side relations and constraints. 


13.2. The structure of vector spaces. The general vector space 
V={u, v, w, ...} over the field F ={a, b,c, ...} is a set of double 
composition with the following properties (as in 8.3). The operation 
of addition is defined within V, the set of vectors being an additive 
group with all the operational rules (including the commutative one) 
being valid. The outside set F provides scalars so that a vector v 
can be multiplied by a scalar a to provide another vector av. Scalar 
products satisfy an associative rule: a (bv) =(ab)v and two distributive 
rules: a(u+v)=au+av and (a+b)v=av+bv. Particular scalars 
operate: 1 xv=v, (~1)xv=-v and 0xv=0. The last means that 
the scalar zero times any vector gives the vector zero; the use of two 
zero elements (scalar and vector), with the same notation 0, need 
cause no trouble. 

The algebraic structure, rather than the geometric interpretation, 
of a vector space is now examined, with emphasis on the linear 
aspects. One vector v is a linear combination of, or depends on, a set 
{v1 Ve, ...} of vectors if there are scalars such that v=a,v, +a,.0,+... .* 
From another aspect of the same property, a set {v,, v., ...} of vectors 
is linearly dependent if scalars, not all zero, can be found so that 
AV, +Qv,+...=0, 1.6. if some linear combination of the vectors 
produces the zero vector. Suppose a,~0, so that the condition for 
linear dependence can be written: 


a a | 
v= ( ~ 2) y+ ( τονε 1+. =Agv,  λοῦᾳ +... forsome scalars A,, λῳ... 


Hence, in a linearly dependent set, at least one vector is a linear 
combination of the others. On the other hand, a set {v,, v,, ...} of 
vectors is linearly independent if no scalars can be found so that 


* Some of the scalars a,, ας», ... can be zero and v is still said to depend on the full 
set of vectors {v,, v2, ...}, though in this case v also depends on some set of fewer 
vectors. 


8085. LINEAR ALGEBRA [13 


4,0, +4.,+...=0, the trivial case a,=a,=...=0 being excluded. 
In such a set, no one vector is a linear combination of the 
others. | 

It is not implied that a vector space V has any linearly dependent, 
or any linearly independent, vectors. This is still to be explored. It is 
intuitively clear that V may include a few vectors which are linearly 
independent but that, as more vectors are taken, the risk of linear 
dependence increases. We look naturally for the largest set of in- 
dependent vectors. On the other hand, there may be sets of vectors 
in V on which all vectors (and not just one) of V depend, i.e. a set 
— {0,, Vg, ...} such that every vector v is a linear combination of 
U1, Voy ee 1 V=AW, +A V,+... for scalars Aj, A,,.... If such a set 
exists, then it is said to span V. The implication is that a spanning 
set of vectors is enough to describe V, all vectors being some linear 
combination of them. Here, we look for a rather large set of vectors 
to span V and, the fewer vectors we take, the greater the risk that 
they fail to span V. We look naturally for the smallest set of vectors 
which spans V. These matters are treated quite formally in 15.9, 
where the following basic result is established. 

There may be no set of vectors, however large in number, sufficient 
to span V. The vector space is then said to be of infinite dimension. 
Otherwise, there is a positive integer n which is both the largest 
number of linearly independent vectors in V and the smallest 
number of vectors spanning V. More specifically, there exists a set 
ἔνι, Ve, ... Un} οὗ m vectors such that the vectors are linearly inde- 
pendent and span V. The vector space is said to be of dimension n 
and such a set of n vectors is said to be a basis of the space. The 
corollary is equally useful in practice: no set of more than 7 vectors 
of V can be linearly independent; no set of fewer than n vectors can 
span V. In particular, given ἢ - 1 vectors, they must be linearly 
dependent; given only ἢ — 1 vectors, they are not enough to span V. 

These general ideas can be applied to the special and practical case 
of a space V,(F) of n-tuples over a field F (usually real numbers). 
The field F is used to provide both the scalar a and the components 
4, Lo, ... Z, Of the n-tuple vector v. The first feature of V,(F) to 
emphasise, and the one of most immediate use in practice, is the way 
in which sums and scalar products are defined and hence the way in 
which linear combinations of n-tuple vectors are constructed. 


2] LINEAR ALGEBRA | 369 
DEFINITION: In the space V,,(F) of n-tuples, the sum of two n-tuples 
V= (X11, Lg... Tp) and W=(Y4, Yo, --- Yn) 18 defined as: 
Vt W= (21 4+Y1, ἂς ἜΜ»... Ln ἜΜ) 
and the sealar product of a and v is defined as: 
QU = (AX,, AL, ... ALny). 
This is very simple: to add n-tuple vectors, we add the components 
separately; to multiply an n-tuple vector by a scalar, we multiply 
each component by the scalar. 
Repeated applications of the specified operations serve to build 
up a linear combination of n-tuple vectors. Write m vectors: 
Vy = (1), 2:21»... Lay); Ve= (Lig, Tog, ++. Lng); «++ Um = (Lams Lams +++ Lum) 
and a corresponding set of m scalars: a1, ας, ... Gm. Then: 
υπξαχγυ, Tega Ὁ... +g Ven coccccccccccceccccccccscecs (1) 


is a linear combination of the m vectors. The problem is to write the 
components of v=(2,, X2,...%,) separately. By the scalar product 
rule: 

AyVy = (04X11, Ay%q), ... AL) 


ApgVy = (AgX19, ApUoo, ... AgLng) 


Onin =(UmLims UmLams ν.. Om Enm)- 
By the sum rule, the components of v are simply the sums of the 
separate components of 2,01, QVo, ... GmVm, Written above in vertical 
line. Hence: 
Ly = 14X11 + Aph 15 + eins + AnXym 


Ly F=AyEny + Aghns t+... TAnXnm 


Hence, writing the linear combination (1) for n-tuples simply means 
that each component separately is the same linear combination as 
given by (2). This is so basic, and so important in practice, that it is 
useful to set it out formally: 


THEOREM: In the space V,,(F) of n-tuples, sums and scalar products 


are so defined that a linear combination (1) of n-tuples implies corre- 
sponding linear combinations (2) of the components separately. 


370 LINEAR ALGEBRA [13 


Hence, at its lowest, the form (1) is a convenient short-hand for the 
more detailed (2). Whenever we write a linear combination of 
n-tuples (1), we can always ‘equate components’ separately as in (2). 
Conversely, if we are given a set of equations of form (2), we can 
always shorten it into n-tuple vector form (1). There is, however, 
more in the analysis than this; once appropriate algebraic rules are 
developed, we find it much easier to operate with (1) than with (2). 
One application of (1) and (2) is immediate. If v,, v,, ... vm, are 
linearly dependent, then scalars @,, 2, ... dy (not all zero) exist so 
that (1) is the zero vector 0 =(0, 0, ... 0). This means that all the x 
expressions on the right of (2) are zero. On the other hand, if the 
vectors are linearly independent, then no such scalars exist. The ἡ 
expressions of (2) cannot be all zero together, i.e. for any scalars 
Ay, As, ... Ay (not all zero) at least one of the expressions is non-zero. 
The dimension of the space V,,(F) of n-tuples, and a convenient 
basis for the space, are easily found. The three examples of 13.1 
illustrate: 
(i) The vector space of real numbers over the field of rationals has 
plenty of linear combinations, i.e. real numbers such as 
J2(/2+1)=2x14+1x /2, 
a linear combination of 1 and ,/2. There are also plenty of linearly 
independent sets of real numbers, e.g. 1, ./2 and ./3 which are such 
that no rational a, ὃ and c exist for a+6,/2+c/3=0. But no finite 
set of real numbers 2,, %,...%, can be found so that every real 
number z is a (rational) combination of them: 
HL = Ay + Ag®s +... + AynXy. 
This is because of the inadequacy of rational multiples to take care 
of the multiplicity of irrationals, including surds like /2 or /3 and 
transcendentals like mw or 6. The space is of infinite dimension; it 18 
not a case of V,,(/). | 
(ii) The vector space of real number pairs (x, y), or of complex 
numbers z=x-+iy, over the field of real numbers, is an instance of 
V.(F). By the rules for addition and scalar multiplication, any 
number pair can be expressed: 
(x, y) =a (1, 0) +4 (0, 1) =2e, + Yes 
where «,=(1,0) and ¢,=(0,1) | 


Hence the two vectors {e,, «,} span the space. Moreover, they are 


2] LINEAR ALGEBRA 371 


linearly independent since a,¢, + ,¢,=4, (1, 0) + a@,(1, 0) =(a,, a2) 40 
(except in the excluded case a,=a,=0). The vector space is of 
dimension 2 and a basis is provided by ε, and ε4. (3) shows the 
dependence of every vector (z, y) on the basis. This is not the only 
basis; indeed almost any pair of vectors can serve as a basis, e.g. 
7,=(1, 1) and 7,.=(1, -- 1) is a basis with (3) replaced by: 

(x, y)=3(@+y)yi +2 (ὦ -- ))η:: 
Further insight is obtained by representing the number pairs (2, y) 
in graphical terms, i.e. as points referred to axes Oxy (Fig. 13.2). 
Let P, and P, be two given vectors. They are 
linearly independent as long as they do not lie 
on the same radius through O. For, linear 
dependence of P, and P, implies P, is a multiple 
of P,, 1.6. OP, and OP, are the same radius. 
Moreover, any other point P can be expressed 0 
as a linear combination of P, and P,, by 
multiplying by scalars to turn P, into Q, and P, 
into Q, in such a way that OP is that resultant (i.e. the vector sum) 
of O@, and OQ,. Hence 

OP =A,OP,+A,0Pz, 


for \,=0Q,/OP, and 4,=0Q,/OP,. Hence, as long as OP, and OP, 
are distinct directions, the vectors P, and P, serve as a basis for the 
vector space; this is what is meant by ‘almost any pair’ specified 
above. The particular basis suggested for use is the pair of points 
A (1,0) on Ox and B(0,1) on Oy. These mark off the units of 
measurement on the axes and any point P (x, y) is expressed as 
a(1, 0)+y(0, 1), 1.6. z units along Ox and y units along Oy. 

(iii) The vector space of quadratics axz?+ba+c, with real coeffi- 
cients, over the field of real numbers, corresponds to real number 
triples (a, ὃ, c) and it is a case of V,(F). The corresponding relation to 
(3) is here: 

(a, ὃ, c)=a(1, 0, 0) +0(0, 1, 0) +¢(0, 0, 1) 
= de, + beg + Ces 


A(i,0) x 
Fia. 13.2 


so that the space is of dimension 3 and the three tuples e,=(1, 0, 0), 
e,=(0, 1, 0) and e,=(0, 0, 1) provide a basis for the space. Explicitly 
in terms of quadratics, the basis is «,=2*, «,=a and «,=1. The 


N A.B.M. 


372 LINEAR ALGEBRA [13 


dependence of any quadratic on the basis is given by (4), translated 
into: 
aaz*+ba+c=axx?+bxatcxl. 


There is a three-dimensional graphical representation, similar to 
Fig. 13.2 but with the extra dimension. A quadratic or triple (a, ὃ, c) 
is shown by a point P (a, b, c) referred to axes Oabc. Three points 
P,, P, and P, are linearly dependent if they all lie in one plane 
through O; they are linearly independent (and available as a basis) 
if they do not. The basis suggested is given by the three points 
A (1, 0,0), Β (0,1, 0) and C (0, 0, 1) which are at unit distances 
along the three axes. 

The general result suggested by these examples, and established 
formally in 15.9, is the following: 


THEOREM: The space V,,(F) of n-twple vectors (21, %g, ... L,) over the 
field F has dimension n and any set of n vectors, which are both linearly 
independent and span V,(F), can be used as a basis. A set of more than 
n vectors cannot be linearly independent. A set of fewer than n vectors 
cannot span V,(F). 


A convenient basis for V,,(/) is provided by the n vectors :* 
€, =(1, 0, 0, ... 0), e, =(0, 1, 0, ... 0), ...€, =(0, 0, 0, ... 1) 
and the dependence of any vector on the basis is given by: 
(ἀν Dia ρος) ΞΕ eg 8 6g Pitas ἘΔ Ως ἀν χεῦρονουὺν ἐς (5) 


Here, (δ) for V,(/F) is an obvious extension of (3) for V,(/) and (4) 
for V,(F). The theorem shows that, while «,, ἐ9» ... «, are linearly 
independent, the addition of one more vector produces a set which 
is not linearly independent. Moreover, while ¢,, €,, ... ἐμ do span 
V,,(f), any smaller set of vectors cannot do so. 


13.3. Linear transformations and linear equations. A transformation 
is a mapping and a linear transformation is that particular case 
which maps one vector space into another vector space, carrying 
over a linear combination of vectors from one space to the other. 
The linear transformation usually taken is a mapping of the space 


* The space Κ᾽, (7) has zero vector (0, 0, ... 0) for the operation +, but there is no 
unity since no operation x is implied. However, the m vectors «, €,... €, can be 
called unit vectors in V, (fF). 


9] LINEAR ALGEBRA 373 


V ,(£) of n-tuples into the space V,,(#’) of m-tuples. This is examined 
formally in 15.9 and here in particular cases. 

As a simple basic case, though not quite the simplest, take a linear 
transformation 7 which maps V,(/) into itself, i.e. which is a mapping 
of three dimensions into three dimensions. The transformation is 
from a triple (x,, 2%,, x3) of V,(F) into another triple (y,, y2, y3) of 
V,(F). The problem is to specify 7’ and to show how the y’s are 
obtained from the x’s. The essential feature of 7' is that a linear 
combination of triples of «’s must be mapped into the same linear 
combination of triples of y’s. 

The specification of 7' is made as follows. As the basis for V,(F) 
write the three vectors €,=(1, 0, 0), «,=(0, 1, 0) and «,=(0, 0, 1). 
Then: 

(01, Vg, Lg) Ξεάχει + 9ε, +AUy_lg ..0.ὁ0εννονονν νον σον (1) 
gives the dependence of any vector of V,(/) on the basis. Now εἰ is a 
particular vector which is sent by 17' into some other particular 
vector. Let the image of «, under T be the vector (@,,, ὡς.» @3,), where 
T gives the scalar a’s. Similarly, let ες have image (d4., @22, 542) and 
ε4 have image (a5, 55, 33). Altogether there are 3x3=9 scalars 
specified by 7’, the constants a,, for r and s=1, 2, 3. These are, in 
fact, a complete specification of the linear transformation 7' and the 
image (Y1, Ye, Ys) Of any vector (21, X_, 24) can be written in terms of 
the 9 a’s. We are given, therefore, under 7’: 


€y—>(4y1, ὦ» (531); €g—>(A12, Vea, Uye); €g—>(A1g, Tag, (33) ....:1: 1:50 (2) 
By the preservation of linear combinations under 7 and by (1) 
and (2): 
(24, Lg, 18) =Lyeq ἄφες + Lyeg—>Hy (Ay, yj, 31) + Vo(Ay2, Aye, Age) 
+ %3(13, Gog, (5). 


The linear combination of vectors on the right is the vector (y,, Y, Y3)- 
By the rules for sums and scalar products, as expressed in (1) and (2) 
of 13.2, we can write: 


Yj = Ay yXy FAX + χε 
Ya = Gig Ly + ρον + hogy fF vcccesccccaseccccesccses (3) 
Y3 = 3γχ. + AgeXe + Aggy 
for the separate components. Hence, if (x, 22», 73) is sent into 
(Y1, Yo Ys) by 7’, then the y’s are related to the z’s by (3). The linear 


374 LINEAR ALGEBRA [18 


relations (3) are a complete description of the linear transformation 
T of V,(/) into itself, the a’s being scalars fixed by (2). 

We have achieved in (3) an extension of the even simpler case of a 
linear transformation in two dimensions, from V,(/F’) into itself: 


Yy = AyyXy + ἀγχοῦ ANA Yo=Ag1%1 + Age%y 


as examined in 7.5. The extension is an increase from two to three 
dimensions on both sides, the 2’s and the y’s alike. This is not 
necessary; a linear transformation can be from one space V,(F) of 
dimension 7 to another space V,,(F') of different dimension m. Two 
other simple cases illustrate. 

Suppose the linear transformation 7 maps the triples (x,, 72, 73) 
of V;(F) into the pairs (y,, y,) of V.(F). Keeping ¢,, ες and ες as the 
basis for V,(F), we specify their images under 7’: 


€1—>(Qy1, 51); €g—>(Ay2, Veg); €g—>(445, Gos) 
so that: (1), Lp, Vg) =X ει - ἅ2ε5 + ἅ4ες 
—>X (441, Mey) + Xa(Ay2, Ugg) +%3(A13, Hos) 
=(Y1, Ys) 
where Y1 =X + Aygo + Ay 3g 
Yo = Ag X Ὑ ἀφο, + sa 


Hence Τ' is given by (4), completely described by 2 x 3=6 scalars, 
the a’s. 

On the other hand, suppose 7' maps the pairs (x,, 24) of V,(') into 
the triples (y;, ¥2, Ys) of V3(F). The basis for V,(F) is the pair of 
vectors «,=(1, 0) and «,=(0, 1) with images under 7 taken as: 


€y—>(q1, Gey, Mg) and €,—->(A4p, 55» 63.) 


and : (1, Ly) = Hy Ey + Ly€q—>X (441, Boy, Agi) +g (Ayq, 5.» Aso) 
=(Y1; Ya Ys) 
where Y = ἀγῶσι + Ay eX, 
You = Ags + WggLe Fo cecccccccccccccecescteecees (5) 


Ys = AgsX + AgeXy | 


T given by (5) is described by 3 x 2=6 scalars, the a’s. 
The generalisation is now clear enough. If the linear transforma- 


8] LINEAR ALGEBRA 375 


tion 7’ maps V,(F) into V,,(f), write the images of the basis for 
VAP): 


€y—>(yy, 3.» ..- Bna)3 €2—>(Bras Bag, ... Uma); +++ €n—>(Ains ans +++ Gmn) 


and, if (21, Xo, -.. %m)—>(Y1, Yar --» Ym), then: 
Yy = 4%, ae Ay eX» 0... FAynXy 


Yo =AgyX + AggXe +... FAgnXy | 
| 
| 


Ym = Ami X1 τ ὦ Κα. +... + Amn®n 


Hence, the general form (6) of the linear transformation, from V,(/) 
to V,,(F'), is specified by m x n scalars, the a’s, as given by the images 
of the basis €,, €5,... ες taken for V,(F). The particular cases, (3), 
(4) and (5), have m=n, m<n and m>n respectively. 

In the linear transformation (6), suppose that we know that 
(x1, ἃς, ... %,) of V,(F) maps into a particular and specific vector 
(b,, be, ... bm) Of V,,(F). Then the n-tuple of χ᾽ Β is such that: 


144%} + Ayo%oe + eas + Aintn = ὃ. 
Ag Ly + ἀφοῦ +... + ἀρέσῃ Ξε ὃς 


AmyXy + Amare +... + AmnUn =0m 


The problem is to find the 2’s, i.e. the solution of the set of linear 
equations (7). Linear equations are a particular aspect of linear 
transformations. 

Hence, we have a dual problem: to invert (6) and to obtain the 
a’s in terms of the y’s; or to solve (7) and to obtain the 2’s which 
satisfy the equations for given δ᾽ 8. In both cases, the set of mxn 
scalars, the a’s, is given, the structure of the transformation or 
equation system. If one problem is solved, so is the dual. 

As a preliminary canter, we can examine the solution of the prob- 
lem in the simple case m =n =2. As shown in 7.5, elimination of one 
variable to get the second is the method to follow. So the inverse of 


the linear transformation, ψ1 Ξε ἄχει + @,2V_ ANd Yo =Ge,% + Age%e, 18: 
Xy Xo _ 1 


ἀφ. -- ἀγχοῦ AyYo -- ἀφγϑχ Ay 1Faq — My 2%21 F000... eee eee: (8) 


where A --αιγ,. ~ αγοάφι τεῦ 


376 | LINEAR ALGEBRA [13 
Equally, the solution of the equations: 
Ay;%,+Qy.%,=6, and Go,%,+4,.%,=—b, 
15: 
Xy = XL» = 1 
Aged, -- αγοῦ, “ἀμ; -- Agyb, © ἀγγᾶρῃ -- ἀγροῦ) 


where A -- 1140 = Ayo F 0 


Here, (8) becomes (9) simply by assigning y,—b, and y,=b,. In (8), 
x, and 2, are linear expressions in the y’s, the inverse of the linear 
transformation : 


1 1 
vy = (ast = ἀμνοὶ and 2, =3( — AorYy + ays) ; 


In (9), x, and x, take specified values, the solution of the equations. 
In each interpretation, the solution of the problem is given in the 
main case where A+0. There remains the degenerate case where 
A =0, and where the linear transformation cannot be inverted, or the 
equations solved, at least completely (see 13.9 Ex. 29). 

The algebraic method used, i.e. the successive elimination of 
variables until only one is left, can be extended, though with in- 
creasing labour, to cases of more than two variables. In 13.9 Ex. 4, 
the inversion of (3) in the 3 x 3 case is achieved. In 13.9 Ex. 5, a similar 
attack is made on the 2 x 3 case of (4) and on the 3 x 2 case of (5). 
A difficulty arises; there is now a ‘surplus’ variable, one variable too 
many, €.g. ὧς in (4) and y, in (5). A solution of the problem is only 
attained if the surplus variable is given an assigned value. 

This suggests two things. First, something must be done to sim- 
plify the algebra in inverting the general linear transformation (6), 
or in solving the general system of equations (7), to avoid having to 
slog out the result in each particular case. The matrix notation is 
introduced for these (and many other) purposes. Second, care must 
be taken to distinguish the various possibilities and, in particular, to 
keep a sharp eye open for any degenerate cases. 


13.4. Matrices. A matrix is a notation for an ordered set of mxn 
elements, arranged in an array of m rows and n columns, for given 
positive integers m and n. The elements can be entities of any kind, 
typically real scalars, but also including such entities as polynomials 


4] LINEAR ALGEBRA 377 


or functions (13.9 Ex. 6). In the following development, however, it is 
taken for convenience that the elements are from some field F of 
scalars. 


DeFtnition : A matrix of order m x nis a set of elements in m rows and 
n columns : 


A=|| ys |[=|] 11 @iz --- Vin 


Ge, gq ... Gan 


Gin Mg ces 
where r=1, 2, ... m denote the rows and s=1, 2, ... n the columns. 


There are two alternative notations. In one, a single letter is used 
for a matrix, printed in bold type A to indicate that it is a complex 
of values (an array of m rows and n columns) and not a single value. 
In the other, the elements of the matrix are spelled out in their 
m xn array and bordered by double vertical lines. Variants of this 
second notation are used: 


Qi 41. . -. & ἄτι ie ees O11 A192 eee Qin 
Qe (25 eee oo Aoo eee Qo (29 eee Aon 
Ami Ame +++ Onn Omi Ane --- mn Amy Ime +++ Amn 


all indicate the same matrix. In the shortened form of this notation, 
1.6. || a,, ||, (α,.) or [a,,], the general subscripts need to be specified: 
r=1, 2,...m and s=l, 2,...n 

Two special cases of matrices arise when m=1 and whenn=1. In 
the first case, we obtain a row vector which is an n-tuple of V,(f) 
where F is the field from which the elements are drawn: 


Matrix, order 1 x n=row vector || ὦ, ἂς ... Gy || =(@1 ἂς ... An). 
In the second case, we have a column vector, an m-tuple of V,,(F): 
Matrix, order m x 1=column vector || ὦ, ||={a@, ὧς ... Am}. 


Ae 


am 


The alternative notation, to the ||... || matrix notation, is to write 


378 LINEAR ALGEBRA [13 


(...) for a row vector and {...} for a column vector; this saves space 
while distinguishing row from column vectors. 

A matrix of order m x n is built up from vectors, i.e. m row vectors 
each of order 1 xn, or n column vectors each of order m x 1. It also 
gives rise to various other vectors. For example, the matrix || a,, || 
of order nxn has for its leading diagonal the n-tuple vector (a,,, 
(22» ... | a 

T'wo particular matrices of a given order require separate 
notations: 


Notation: The zero matrix of order m x nis Onn, consisting of m rows 
and n columns of elements, all zero. The unit matrix of order n x n is 


I,=|| 10 ... 0 || comprising a leading diagonal of elements all unity, 
O01... 0 
00... 1 

and zero elements elsewhere. 


If the order is understood, without ambiguity, the zero matrix can 
be denoted O and the unit matrix I. 

When the elements of matrices are summed in various ways, the 
x notation of 1.7 is of particular use. For example, the sum of the 


n 
elements of the rth row of A=]]a,, || can be written 5 a,, for 
s=1 


r=I1, 2,...m. The most important use of the Σ notation here is in 
writing inner products; these are scalar values which appear in 
multiplying matrices. The notation can be given for vectors as well 
as for matrices generally: 


Notation: T'wo vectors a=(d4, do, ...Q,) and b=(b,, ὃς, ... b,), 
each of n elements, give the inner product: 


n 
a. b= δ᾽ a,b, =ab, +Aobe+... +Andn- 
s=1 


Two matrices A=|| a,; || of order mx k and B=|| ὃ, || of order kxn 
are conformable and give m x n inner products: 


k 
Σ α,ριὃις = α,χθ1. + Apobg, +... + Opp Dps 
ἔξ: 


for r=1, 2,... mand s=1, 2,... n. 


4, 5] | LINEAR ALGEBRA 379 


The definition of conformable matrices, implicit in this notation, is to 
be noticed. Not all pairs of matrices are conformable by any means. 
It is necessary that A has the same number (k) of columns as B has 
rows if A is to be conformable with B. Moreover, matrices are con- 
formable in a particular order, i.e. A of order m x k with B of order 
k xn. Reverse the order and the matrices are not generally conform- 
able, 1.6. B of order k xn is not conformable with A of order m xk 
(except in case where m=n). It is always essential, before writing 
inner products, to check that the matrices used are conformable and 
to stick to the order in which they are conformable. 


13.5. Operational rules for matrices. Notations are always subject 
to specified algebraic operations, usually of a simple kind. The matrix 
notation is quite exceptional in that it is handled algebraically by a 
very elaborate system of operations, the subject of matrix algebra. 
A matrix is a new symbol, for an entity of a new kind and one of very 
wide scope. We are at liberty to define operations on matrices in any 
way we find convenient. Amongst those we choose to define are 
operations labelled ‘addition’ and ‘multiplication’ of matrices. They 
are specified with an eye on the applications of matrices, particularly 
to linear transformations and linear equations. However, as in 
Boolean algebra of Chapter 4, choice of the labels is made as a way 
out of a dilemma. The operation of ‘multiplication’ of matrices, in 
particular, does not satisfy all the familiar multiplicative rules, as the 
label might suggest. It is not the same kind of operation as the multi- 
plication of real or complex numbers, or of the elements of a field 
generally. Not only is the commutative rule invalid, but matrix 
products generally lack reciprocals and fail to meet the cancellation 
rule. It might be preferable to use a different term, e.g. to talk of the 
‘conformation’ of matrices, rather than their ‘multiplication’, since 
the operation is limited to matrices which are conformable in the 
sense of 13.4. However, ‘multiplication’ is the label in general use, 
and we must employ it, though with great care in remembering that 
not all the usual multiplicative rules apply. 

Four sets of definitions are required for matrices of various orders: 

DEFINITION : Equality and inequality. Jf A=|| a,, || and B=|| 6,, || 
are of the same order m xn, then A=B if a,,=b,, and A>B if a,,>b,, 
for allr=1, 2,... mand s=1, 2,...n. 


N2 A.B.M, 


380 LINEAR ALGEBRA [13 


As a particular case, take B =O of order m x n. Then A =O means that 
each element of A is zero and A>0 that each element of A is positive. 
As opposed to the positive matrix 4.» 0, we can write the non- 
negative matrix A>O, meaning a,,>0 for all r and s (except that the 
case all a,,=0 is excluded). Hence A>O covers a range of possi- 
bilities: the case where a,,>0 for all r and 8, together with cases 
where a,,>0 for some r and 8 and a,,=0 for other r and 8. (See 13.9, 
Ex. 10.) 


DerFIniTion: Addition and scalar products. If A=||a,, || and 
B=|| 6,, || are of the same order m x n, then the sum A +B is the matriz 
C=|| δος || where c,,=a,,+6,,, and the scalar product XA is the matrix 
D=|| d,, || where d,,=Aa,., for all r=1, 2, ... mand s=1, 2,... n. 
These definitions are simple enough. Any matrix can be multiplied 
by a scalar A; it is simply a matter of multiplying each element by X. 
Any two matrices of the same order can be added; each element of 
one matrix is simply added to the corresponding element of the other. 
Within the set of all matrices of a given order m x n, the operations of 
addition and scalar multiplication are closed. On the other hand, 
there is no meaning attached to the addition of matrices of different 
order. 

A direct consequence of the definition is that all the rules for the 
operations of sums and scalar products are satisfied: 


THEOREM: The set of all matrices of a given order m xn is a vector 
space over the field from which scalars are drawn. 


To prove, it is a matter of checking the rules from the definition of 
sums and scalar products. The set of all m x n matrices is closed under 
sums and scalar products, sums are commutative and associative: 


A+B=B+A and A+(B+C)=(A+B)+C 
and scalar products are associative and distributive: 
A (pA) =(Ap)A; A(A+ B)=AA +AB; (A+ p)A=AA + pA. 
There is an identity element for addition, the zero vector O of order 


MXN? 
A+0=-0+A=A 


and there is an additive inverse, the negative —A of A: 
A+(-A)=(-A)+A=0. 


δ] LINEAR ALGEBRA 381 
Here, if A=|]a,, ||, then —-A=||(—a,,) || and the same negative 
-A is obtained by multiplication of A by the scalar —1. With a 
negative defined, the difference of two vectors follows at once: 
A-B=A+(-B). If A=|| a,, || and B=]| ὃ,» ||, then 
A -B=|| (a,, -- ὃ,.) ||. 

Finally, a zero difference is to be associated with equality: A -B=O 
implies A =B. | 

_ Derrnirron: Multiplication. Jf A=|| a,,|| of order mxk is con- 
formable with B=|| 6,, || of order k xn, then the product AB is the 
matrix C=|| c,, || of order m x n where c,, ts the inner product: 


k 
Cros= ) Andi, for r=1, 2,... mand s=1, 2,...n. 
i=1 


This definition appears complicated, but it is deliberately designed 
to agree with successive applications of linear transformations. For 


example: 
Write two matrices 
A= Qi Q19 and B= δὴ ὃ... 
Ae1 Aee Boy bos 


of order 2 x 2. They are conformable and yield, by the definition, the 
product 


AB=|| ὁ C12 
Cor Coe 
where: 
| Cyy =Ay10y1 + Azad) — 615 ΞΞαχχθλ. +A 2009 
Coy = Gy1011 + ἀφοῦ 2ι Cog ΞΞ5γχθ1. Ἔ χοῦ). 


Now take the linear transformations: 
| Zy=AyYi tGy2Yo and Y,=0y%1 +0 %, 
Re ΞΞ ἄφγϑι Ὑ απ. Yo = 02:1 + boe%e 
and combine them (in succession) to give: 
2. = yy (D410 + Oy 22) + y2(b21%1 + ὃ 5.1) 
Ze =o 4(B1 1% + Oy2%e) + Ae(Do1X, + Dgo%e) 
1.6. Zy ΞΞδι δ δι ιϑᾶς 
Ze ΞΞ Ο51.01 + 5,5. 
The two given linear transformations have matrices A and Β of 
coefficients. The combination (produet) of these has matrix AB. 


982 LINEAR ALGEBRA [13 


To write the product of matrices AB in practice, the first step is 
check that A is conformable with B, i.e. order m x k combines with 
order k x n (the k being common) to give order m x n. Then the m x n 
elements of AB are written down in succession by the rule of inner 
products, a rule which can be described as ‘across and down’. The 
element c,, of AB is got by reading across the rth row of A and down 
the sth column of B, multiplying corresponding elements and adding 
the results. So c,, is across the first row of A and down the first 
column of B; 6). is across the first row of A and down the second 


column of B; and so on. 

Notice, in particular, that it is only in special cases that both AB 
and BA are defined, and that (even if they are) they are not generally 
the same matrices. Products of matrices are not to be taken as 
commutative. So, if A is of order m xk and B of order k xn, then 
AB can be written. But B of order k xn is not conformable with A 
of order m x k if mn, and no product BA can be written. If m=n, 
then BA exists, but it is not generally the same matrix as AB. Some 
examples illustrate: 

(i) 1 -Ἰ 
1 0 


x ] 
] 


το) 


1 
0 
—] 


but the product in reverse order does not exist. Notice that the given 
matrices are of order 3 x 2 and 2 x 2, which are conformable and give 
a product || c,,. || of order 3 x 2. The inner product, or across and down, 
rule then provides: 


and so on. 


(ii) 1 -1 |x oe bed 

1 0 -~10 01) 

and o1|xi/ 1-1 =| 10 
Ξ 1 of | -11 4, 


Both products exist, but they are different matrices. 


(iii) 1 -—1 ||x 0 |= 1 
1 0 - 0 
0 1 -Ἰ 


5] - LINEAR ALGEBRA 383 


and ll 10 -—1 lx} 1 -1 {=I 1 -2 | 
1 0 
0 1 


are two (different) cases illustrating that a matrix can be multiplied 
by appropriate row or column vectors. 

Since only particular matrices (those which happen to conform) 
have products, there is generally no question that a comprehensive 
set of matrices is closed under multiplication, no question that the 
set is a ring or a field. In particular, in the set of all matrices of given 
order m xn, no products are defined at all when mn. The case 
where m =n is considered later (13.6). 

However, for these matrices A, B, C, ... which do happen to con- 
form, the following properties hold (13.9 Ex. 16-19): 


Associative : A (BC) = (AB)C 
Distributive : A(B+C)=AB+AC and (A+B)C=AC+BC 
Zero matrices: AOin=OmmA=Omn (A of order m xn) 


Unit matrices: AI, =I1,A=A (A of order m xn). 


These are much used in practice. On the other hand, apart from the 
non-commutative feature of products of matrices, it is generally the 
case that reciprocals are lacking and that cancellation is not valid. 
It may be found, for conformable matrices, that AB=0; this does 
not imply that either A—0O or B=0. For example, the following two 
non-zero matrices multiply to zero: 

=|} 0 0 

oe 


11 
00 
So, in a set of 2 x 2 matrices (for example), there can be divisors of 
zero. 
The last definition is of a different kind, but a very simple one. It 
relates to the operation of interchanging the rows and columns of a 
matrix: 


x —@. 


1 1 
-l -ἹἸ 


Derinirion: Transposition. If A =|| a,, || 8 of order m x n, then the 
transpose 
A’ =|| a,, || 18 of order n xm. 
Several properties follow from the definition: 
(A +B)'=A’ +B’; (AB)’=B’A’; (A’)’=A; 1,=1, 


984 LINEAR ALGEBRA [13 


for matrices of appropriate orders. The last of these properties, that 
a unit matrix is unchanged when transposed, raises the question of 
what kind of matrix A has the property: A’ =A. It is easily seen that 
A must be of order n χ and that its elements are symmetrical about 
the leading diagonal (13.9 Ex. 20). 


13.6. Square matrices. A matrix of order n xn, for a positive integer 
n, is called a square matrix; it has the same number of rows as 
columns.* The set of all square matrices of a given order would 
appear to have very tidy properties. All the rules for sums and 
scalar products still apply, so that the set is still a vector space over 
the field of scalars. Moreover, every matrix of the set is conformable 
with every other matrix and the product is also a matrix of the set. 
If A and B are of order n x n, then AB and BA both exist, also as 
matrices of order n x n. Hence the set should be obedient under the 
operation of multiplication of matrices, except (as expected) that 
products are not commutative: AB and BA exist but AB+BA in 
general. 

In pursuing this matter, we note that there is no difficulty about 
either zero or unit matrices of square form: 


AO=OA=0 and AI=IA=A 


all matrices written being of order n xn. The remaining question, 
still to be answered, is whether reciprocals exist, i.e. whether there 
is a square matrix A~! to match the given Square matrix A: 


AA~!=A-1A4 =], 


To get an answer, we need to go a good way off course in intro- 
ducing the concept of a determinant. A square matrix A=]| a,, || is 
an ordered array of n xn scalars (e.g. numerical values). Then the 
determinant A=| A |=|a,,| to be defined is a single scalar to be 


* A matrix of order mx n, where m+ n, can likewise be called ‘rectangular’. These 
terms have no geometric content; they refer only to the appearance of the array of 
elements of the matrix. 

¢ This is the usual case but it can be extended to a Square matrix of elements of 
any kind (e.g. polynomials), in which case the determinant is a single entity of the 
same kind (e.g. a single polynomial). Determinants arise in quite elementary algebra, 
in connection with the solution of linear equations. The impression is sometimes con- 
veyed that a matrix is a generalised determinant. It is nothing of the sort. A matrix 
is an array of elements; a determinant is an algebraic expression in the elements. These 
are two quite different concepts. 


6] LINEAR ALGEBRA 385 


derived from the n xn scalars of A. The definition is built up on the 
principle of mathematical induction; a determinant is first specified 
very simply in the case n= 1 and then a rule is laid down to express 
determinants of order n in terms of those of order ἢ -- 1. A notation 
is required: 

Notation: If A=|| a,, || 18 @ matrix of order n xn, write A,, for the 
matrix of order (n -- 1) x (n — 1) obtained by deleting from A the rth row 
and the sth column, intersecting in the element a,,. If A=| A | is the 
determinant of A, then the co-factor of a,, in A is the determinant of A,s 
with an approriate sign: A,,=(—1)"+* | A, |. 

The rule for a determinant A of order 7 is given in terms of the co- 
factors A,,, 1.6. in terms of determinants of order ἢ -- 1. 


Derinition: The determinant A =| A | =| a,, | of order n is obtained 
from a square matrix A=|| a,, || of ordern xn. Whenn=1, A=A=ay,, 
a single element. The rule for a determinant of order n in terms of 
determinants of order n—1 18: 


A = 2 aA LE ccc cere eercccscecoseccsveserescns ( 1) 


where A,, ts the co-factor of a, (t=1, 2, ... n). 

From the definition, determinants of successive orders are 
evaluated step by step. The relation (1) is in terms of the elements of 
the first row of whatever matrix is considered, and of the appropriate 
co-factors of these elements. So: 


w= A= 
and A=, 
n=2: A=|| G4, 12 
Be, Ooo 
and A = yy | Go | -- G12 | Gey | =A 11429 — αγοῦῦργ. 


Here the determinant A is the ‘cross-product” @,,@2 —@124_; of the 
four elements of A. It is the expression already met in inverting a 
2 x 2 linear transformation or in solving two linear equations in two 
variables (13.3 above). 

n=3: A={] ἅμ. Aq As 
Qo1 Ae Aes 


31 Age Az33 


986 LINEAR ALGEBRA [13 


and A=0)5 | QBso Qos 
B32 Ags 


— 12 | Ag, (93 


As, Ass 


+13 | Aer Ago 


αι Ase 


Using the ‘cross-products’ for the determinants of second order: 
A = 0yy(Ay9033 -- ggg) — ἀγρ(αγα — (0131) + A13(421%gq — ὦ 550,3.) 
1.6. 
A = Gy τα; ας -- ἀγχα και» + A950) — 1 %91A3g + A13A91Ag — AyghoaMg...(2) 


This appears, as in 13.9 Ex. 4, in the inversion of a 3x3 linear 
transformation. 

The process continues for n =4, 5, ..., by repeated use of (1), and 
the expansion for A in terms of the a’s becomes increasingly involved. 
However, the pattern is already clear in (2) for n=3. In general, for 
A of order n, there are n! terms in the full expansion, each consisting 
of n elements, one from each row and one from each column of A. 
By the symmetry of this expression for A, it follows that the expan- 
sion rule (1) need not be confined to the first row of elements in A; 
it could equally well be given in terms of the rth row: 


A=)an,A, (r=1, 2, ... n). 
t=1 


Further (1) need not be given only for rows; it is equally applicable 
to columns: A = 0, An (s=1, 2,...). Again, from the pattern of 
t=1 


plus and minus terms in (2), it appears that Α -- if one row of 
elements in A is identical (element by element) with another row. 
For example, put @,;=@1, Q,,=Go,... and AO. Making this 


nr 
substitution in (1), we obtain: Σ a,,4,,=0. This implies that, if the 
t=1 


elements of the second row of A are multiplied by the co-factors of a 
different row (the first) and the products added, then the result is 


zero. The result is generalised for any pair of rows: Σ a,,A,,=0 (rs); 
t=1 


nr 

and for any pair of columns: > a,,A,,—0 (rs). The two sets of 
t=1 

results can be assembled in expansion rules for determinants: 


THEOREM: The expansion of a determinant A =| a,, | of order n in 
terms of co-factors proceeds either by rows or by columns: 


6] LINEAR ALGEBRA 387 


By rows: Y 44Ag=Al(r=s) and --Ο(γφ 8) 


i= 


By columns: A1s4;,=A(r=s) and mare) 
1 


The expansion rules are essentially simple: if elements and co-factors 
of the same row or column are taken, the result is A; if elements and - 
co-factors of different rows or columns, the result is zero. 

Notice that the notation for determinants is an exact parallel of 


that for matrices, single vertical lines |... | being used instead of 
double lines |]... ||. So the matrix A=|| a,, || gives the determinant 
A=|a,, |. In full: 

Bey Age «.. Den ei Age ... Ben 

Ant Ano eee Onn Ant Ang eee Ann 


represent a square matrix of order nxn and its determinant of 
order n. 

Square matrices of order n xn can be divided into two classes, 
according as the determinant is zero or non-zero: 


DEFINITION: The square matrix A with determinant A is singular if 
A =0 and non-singular if 40. 
The existence of an inverse matrix A-! to a given square matrix A 
turns on whether A is singular or not. The following definition and 
theorem establish the position. 


DEFINITION: The square matrix A of order n x n has determinant A. 
If A ts singular (A =0), no inverse A-! exists. If A is non-singular 
(A40), then 


1 1 | 
At=5| A,, =A Ay, Ag 2. Any |. 

Ay Aag sis Ans 

Ain Asn ον Asn 


where A,,=(-—1)'t*| A,, | is the co-factor of a,, in A. 
Hence A- is also a matrix of order nxn. By the definition, it is 
obtained by writing the matrix || A,, || of the nxn co-factors, by 


988 LINEAR ALGEBRA [13 
transposing to || A,, ||=|| A,, ||’ and by dividing every element by 
A. This requires 40. To identify A-1 as the inverse of A in the 
sense of a group under x: 


ΤΉΞΟΒΕΜ: If A is non-singular, then A-! is the only matrix with the 


property : 
AA7!=A—1A =I. 


Proof: AA-1=|| Gye || |] Ase ||/A =|] Crs || 
where pce Σ᾽ OAs 
Ae 


by the multiplication rule for matrices. But c,, =7A =1 (r=s) and 


Crs =70 =0 (r#8s) by (3) above. Hence || c,, || =I. Hence AA-!=I and 
similarly A~1A =I, i.e. A-! is one matrix such that AA-1=A-1A =I. 
To show that it is the only such matrix, let B be any matrix such that 
AB =BA =I. Then: | 
A-1=A-—1.=A-—1(AB) =(A-1A)B=IB=B (since A1A=J). 

Hence B must be A-! and A-? is unique. Q.E.D. 

For non-singular square matrices A, the basic property of inverses 
is: AA~1=A-1!1A =I. Other properties are easily derived: 

(A-1)-1=A; (AB)-!=B-14-; (A’)-1=(A-})’. 

As a particular case I-1 =I, i.e. the unit matrix is such that it is the 
same matrix as its transpose and inverse (I=I’=I-!). The square 
matrix A with the property that A’=A is a symmetric matrix, as 
noted at the end of 13.5. The square matrix A with the property that 
A-!=A’ is also of considerable interest. Such a matrix is called 
orthogonal (13.9 Ex. 25). The unit matrix I happens to be both 
symmetric and orthogonal. 


Given a square non-singular matrix of low order, we can always 
slog out the calculation of the inverse from the definition: 


ΞΕ. 1 0 
01 


has inverse | Τ᾽: ὧν ] 


Χ 


01 


(i) Since 1 2 
0 I 


ἦν 


by the multiplication rule, A= | 12 | 
Ol 


6] LINEAR ALGEBRA 389 
Checking: A,,=1, A,,=0, A,,=—-—2 and A,,.=1 for the matrix A 


with determinant A=1. Hence A-!=|| 1 -- ὃ /I. 
0 1 

(ii) If A=|| 1 2 3 ||, then Α-1-- [1 -2 1 

012 0 1 -2 

00 1 0 0 ] 


A-1 is derived by calculating co-factors in the determinant Α --1 as 
follows: | 
Ay= 1 A,= 0 A,,=0 
A= -- Ax»,= 1 A.,=0 
Ay= I Ay= -- ὃ Ἀν ΞΞῚ, 
cos θ sin θ 
—sin θ cos 6 


(iii) A= cos θ — sin θ 


sin@ cos@ 


has inverse A-!= 


ἕω 

This is an example οὗ an orthogonal matrix. 

This method of calculation of the inverse A-! of a given matrix A, 
from. the definition, becomes very laborious as the order of A gets 
larger. Various techniques have been designed for inverting a matrix, 
for use on high-speed computers. The problem is important because 
of its application to linear transformations and systems of linear 
equations (13.8 below). 

Consider the set of all square matrices of order n x n with elements 
drawn from a field F of scalars. Denote the set by M,,(F). Since sums 
and scalar products of square matrices are appropriately defined, 
M,(F) is a vector space over F, the zero matrix being O, the square 
matrix of nxn zero elements. Further, any nxn matrix can be 
multiplied by any other n x ἢ matrix, and M,(F) is closed under the 
non-commutative operation of multiplication. The associative and 
distributive properties hold and there is a unit matrix I of order n x n 
such that AI=IA=A for any nxn matrix A. Hence, M,(F) also 
has the structure of a ring, i.e. an additive group with a product 
operation satisfying the associative and distributive rules. On the 
other hand, M,,(/) falls considerably short of a field. It lacks inverses, 
since only non-singular matrices have inverses and the singular 
matrices in 1/,(/) do not. Further, cancellation is not valid and there 
are matrix divisors of zero, as shown in an example in 13.5. So: 

THEOREM: The set of M,,(F) all square matrices of a given ordern xn 
as a vector space over the field of scalars and it has the structure of a ring. 


990 LINEAR ALGEBRA [13 


The product operation is associative and distributive, with a unit matrix 
I; it lacks inverses and it has zero divisors. 


Square matrices, therefore, have most (though not all) of the 
desirable properties for three operations: sums, scalar products and 
products. M,,(F) is called the total matrix algebra. 

An improvement is achieved, as far as the product operation goes, 
when a more limited set of square matrices is taken: the set L,() of 
all non-singular square matrices of given order nxn, a subset of 
M,(F). Since inverses now exist for all matrices of L,(/), and 
(consequently) cancellation becomes valid, the set is a non-com- 
mutative group under x. It is called a full linear group: 


THEOREM: The set L,,( F) of all non-singular square matrices of given 
order nxn is a full linear group, i.e. a non-commutative group under 
the group operation of multiplication of matrices with a unit matriz 1. 


Against this, there is a loss to record: the set L,,(/’) loses its standing 
as an additive group and hence as a vector space. In fact, it is not 
closed under addition since two non-singular matrices can add to a 
singular matrix, i.e. to a matrix outside L,(F). An example is given 
in 13.9 Ex. 14. It is, however, enough to quote the fact that a non- 
singular A and its non-singular negative (~A) must add to the 
singular matrix 0. 


13.7. The rank of a matrix. A matrix A=|| a,, || of order m xn has 
elements from a field F of scalars. Consider the m rows of A as n-tuple 
vectors : 
Vp= (Bey, Org, «+s Gey) for r=1, 2, ... m. 

A vector space V over F is generated by taking all linear combina- 
tions of the set {v,, v,, ... ὕω of m vectors, and let the dimension of 
this space be p. Since all vectors of V are linearly dependent on the 
set of m vectors, and since p is the largest number of linearly in- 
dependent vectors in V, it follows that p cannot exceed m. A set of p 
linearly independent vectors is included within the set {v, v2, ... Um}. 
Now p=m is possible, implying that the row vectors ¥,, U9, ... Um of 
A are linearly independent. Equally, p< is possible, implying that 
the m row vectors of A are themselves linearly dependent, i.e. linear 
combinations of some smaller set of p row vectors of A. The integer 
p is called the rank of the matrix A. 


7] LINEAR ALGEBRA 391 


DEFINITION: The rank p of the matrix A of order m x n is the dimen- 
ston of the vector space of n-twples generated by the rows of A. Then 
p<xm, where p=m implies that the rows of A are linearly independent 
and where p<m implies linear dependence among these rows. 

Hence, in general, there are p linearly independent rows in A and 
m — p rows left over as dependent on them.* 

Rank can be expressed in terms of determinants. The connection 

arises as a consequence of two basic results. 


THEOREM: If A=||a,, || ts of order nxn, then the determinant 
A =0 tf and only if the rows of A are linearly dependent. 


Proof: if the rows σι, V2, ... Vv, a8 n-tuple vectors are linearly de- 
pendent, then one at least (say v,) is a linear combination of the 
others: 

Vy =Agat+AgUgt... tAnv, for some scalars Aq, Ag, ... An 
Writing the n-tuples as v,=(d,1, G2, ... Grn) and separating off the 
sth components: 

ας = Asha; + λείας tics tAgd,, for 8-:1,2,.... Ὁ .........(1) 
From (3) of 13.6 for rows: 
n n n 
A = Σ GAs and Σ, AyAy= ) αρνΆ.ω--... =0. 


By (1): A= 2, AeA re = Az 2 α,νήι, +As5 2 ιν! +... =0. 


Hence, linear dependence of rows of A implies A =0. 

Take the converse proposition, and show that A =0 implies linear 
dependence of rows of A. If it happens that all the co-factors A,, are 
zero, in producing A =0, then it is clear that there is linear dependence 
in the rows of the co-factors and even more so in the rows of A. To 
illustrate this, take the case n=3: 

A =| Gy αι, 3 |=0; Ay = 
Ae1 Agg Age 


=0; —Ay.=| Gq, Gag [-Ξ0; ... 


31 Assy 


Ase Ags 


A32 (43 
G31 85 Agg 

* This development lacks symmetry and it is incomplete. There is no reason why it 
should be expressed (as here) in terms of rows as opposed to columns of A. It can be 
shown (15.9 below) that the same result is obtained for columns as for rows. Let the n 
columns of A and their linear combinations form a vector space of dimension p’, 
where ρ΄ <n. The basic result is that p =p’, i.e. the row rank and the column rank of a 
matrix are the same. Further, p<smaller of m, n and A has p linearly independent 
and m -- ρ linearly dependent rows, and p linearly independent and n — p linearly de- 
pendent columns. This result is reached, indirectly, at the end of the present section. 


992 LINEAR ALGEBRA [13 


Now A,,=0 gives ας, — ὥρας. = Ὁ and either one or both rows of 
A,, consist of zero elements or one row of A,, is proportional to the 
other (1.6. Ae. --  λᾶ9. and A,,=Ad33 for some A). If this holds for all the 
co-factors, then A must either have at least two rows (or columns) of 
zero elements or have each row proportional to another row. There 
is plenty of linear dependence among the rows of A. It remains to 
show that there is linear dependence even when the co-factors are 
not all zero. Suppose A =0 but one or more of the first column of 
co-factors (A,,, 4... ... 4n;) are non-zero. From (3) of 13.6 for 
columns: 


n n n 
Land 1=A=0 and 2, Ara ea Σ, σά et 
r= r= t= 


Combining (4,1, 2, ..- Gn) Into the vector v,: 


n 


YY v,Ay=0 or Α40. + AgVot... + Ani, =9 


r=1 
for multiples which are not all zero. Hence, if A=0, the rows 
V4 Vg, -.. Un Of A are linearly dependent. Q.E.D. 


The second result deals with the various square sub-matrices 
which can be got from A of order m x n by suppressing certain rows 
and columns: 


THEOREM: If A=|| a,, || is of order m xn and rank p<m, then the 
largest square sub-matrix with a non-zero determinant is of order p x p. 


Proof: suppose the largest such sub-matrix is B of order k x k. Then 
|B |40 and by the previous theorem the rows of B are linearly 
independent. Take the k rows of A which include B. These are also 
linearly independent; otherwise, if the k rows of A are linearly de- 
pendent, so are the rows of B (and this is not so). Since A has k 
linearly independent rows, its rank p>k. Next, take a square sub- 
matrix C of order (ὦ +1) x (k+1) in A. Since C is bigger than B, we 
have | C |=0 and by the previous theorem the rows of C are linearly 
dependent. The k+1 rows of A which include C are also linearly 
dependent; otherwise, if they are linearly independent, so are the 
rows of € (which is not so). Since A has k + 1 linearly dependent rows, 
its rank p<k+1, ie. p<k. So p>k and p<h, i.e. p=k. Q.E.D. 

The implication of this result is that there is at least one non- 
singular sub-matrix of order p in A but no non-singular sub-matrix 
of order greater than p. All square sub-matrices of A with more than 


7, 8] LINEAR ALGEBRA 393 


p rows and columns are singular. Hence, the rank p of A is both the 
largest number of linearly independent rows of A and the order of the 
largest non-singular sub-matrix in A. By the symmetry of this 
second property, it follows that p must arise equally from the rows 
and the columns of A, i.e. p is also the largest number of linearly 
independent columns of A. Further, it follows that p< as well as 
pam. 
For square matrices, the results can be summarised : 


THEOREM: A is of order n xn and of rank p<n. Then p=n implies 
that A 18 non-singular and that all rows (or columns) are linearly 
independent; p<n implies that A is singular and that n — p of the rows 
(or columns) are dependent on the other p. 


For matrices which are not square, we can make the following state- 
ments. If A has m <n, i.e. fewer rows than columns, then p must be 
less than nm and there are surplus columns in A (n-—p of them de- 
pending on the other p). If A has ἡ <m, i.e. fewer columns than rows, 
then p must be less than m and there are surplus rows in A (m —p of 
them depending on the other p). 


13.8. Solution of linear equations. A general assault on the problem 
of 13.3 can now be made. Write: 


44% + AyoXo + ete + Aint = ὃ, 


AL + μα, t+... + Aon%n Ξεῦς 


AmyXy + AmeXe + ... Ὁ AmnXn = Om 


as a system of m linear equations in n real variables 2, 22» ... Ly. 
Once a solution of these equations is obtained, i.e. values of the x’s 
in terms of the given b’s (and the a’s), then the parallel problem of 
inverting a linear transformation is also solved. Replace the b’s in (1) 
by variables y,, y;, ... Ym and (1) becomes the linear transformation 
from the n-tuple (21, 2, ... Z,) to the m-tuple (yj, y2, ... Ym). The 
solution of (1) becomes a set of relations giving the x’s in terms of the 
y’s (and the a’s), i.e. the inverse linear transformation from the 
m-tuple (y;, Ya, ... Ym) to the n-tuple (x1, 2, ... Ln). 

The system (1) condenses neatly in the matrix notation. Write 
A=]| a,, || for the matrix of order m x n made up from the coefficients 


394 LINEAR ALGEBRA [13 


on the left of (1), and write x ={z,x, ... x,} for the column vector of 
the variable z’s. The product of the m xn matrix A and the nx 1 
matrix x is a matrix of order mx1, i.e. another column vector 
comprising m components. By the inner product rule for matrix 
multiplication, this product Ax has m components which are pre- 
cisely the expressions on the left of (1). Hence the column vector is 
the same as b={b,b, ... b,,}, the column vector of given 6’s in (1). So: 

PS δ᾽ τ νυν cut racetadistiiebatianes (2) 


is the system of linear equations (1) in matrix notation. 
_ Given the matrix A and the vector b in (2), the problem is to find 
the vector x. In general, A is of order m xn and of rank p. 
Main case: m=n=p. 

The system has as many equations as variables (both). The matrix 
A is square (order ἢ x ἢ) and, its rank being 1, it is non-singular. The 
determinant A+0 and the rows and columns of A are linearly in- 
dependent. In this (very tidy) case, the system of equations has a 
unique solution vector x, i.e. the variables x,, 5, ... 2%» are given 
uniquely in terms of the given coefficients (the a’s) and the given 
constants (the b’s). The solution is obtained: 

Since A is non-singular, write its inverse A-!. Pre-multiply each 
side of the matrix equation (2) by A-!: A-1Ax =A~-'b. But 

A-1Ax =Ix =x, 

Hence: | Gad» ae | ee eT ea Rn OP Teer (3) 
Then (3) is the required solution of (2). To summarise: 


THrorEM: The solution of the linear equations Ax=b, where A is 
non-singular, is the unique vector x = A-b. 


The solution in practice requires only some convenient methods of 
inverting the matrix A, suitable for calculation by hand, by desk 
machines or by computer. Such methods are available. When n is 
small, the equations can be solved by a process of eliminating 
variables in succession until only one is left, and there is also a fairly 
convenient formula for use when n is not large (13.9 Ex. 28). Two 
simple examples illustrate: 


(i) %,+2%,+3%7,=2 with A=|| 1 2 3 ||andA=|/ 1 -2 1 
t,=1 001 0 1 


8] LINEAR ALGEBRA 395 


as obtained in example (ii) of 13.6. The solution for x ={7,, 2, %} is: 


x-A-b= ] —2 1 x 2 -Ξ-Ξ ] 1.6. v= 1, Le = — i. X,=—1. 
0 1 --2 1 —l 
0 o if dia 1 


This can be checked by finding 25, x, and 2, in succession, taking the 
equations in order from the last to the first. 
(1) παι χε τ ας,εξεα 
vy — Lo -+ X3 = b 
1 + Lo —_ X3 =C 


with A=|| -1 1 1/i]| and Απ|Ξ [0 αὶ 4 
1-1 4 10 
1 441 1440 
as can be found from the definition of an inverse matrix. Hence: 
x=A-1b=|| 0 4 4 || xi] α [|| $(6+¢) 
404 b 4(c+a) 
440 δ 4(a+b) 


1.6. α, Ξε ᾧ (Ὁ - 6), x, =4$(c+a) and ας Ξε ἔξ (α -- δ). This can be checked by 
eliminating x, and x, in succession and getting 2, =4(b +c) in the end. 


Degenerate cases: p<m or p<n or both. 

In all cases, other than the main case, the rank p of A must be less 
than one or other or both of m and n. The cases are all lumped together 
with the label ‘degenerate’ since there is always some departure from 
the simplicity of solution which characterises the main case. The 
following assortment of cases are covered: 

(a) p<m=n. There are as many equations as variables but the 
square matrix A of order n x7 is singular, p being less than n. The 
number of linearly independent rows (or columns) of A is p and there 
are ἢ — p rows (or columns) left over as dependent on them. 

(6) pxm<n. There are fewer equations than variables and the 
matrix A is not square. Here p<n must hold, i.e. A has p linearly 
independent columns and 7 -- p left over as dependent on them. This 
is the case of ‘surplus’ variables. 

(c) p<n<m. There are fewer variables than equations and A is not 
square. It must be that p<m so that A has m — p rows dependent on a 
set of p linearly independent rows. This is the case of ‘surplus’ 
equations. 


906 LINEAR ALGEBRA [13 


The first’ question to decide is how the vector b={b,), ... b,,} of 
given constants relates to the matrix A. There are τ columns in A, 
each an m-tuple vector, and n — p of them are linearly dependent on 
the p others. Another m-tuple vector b is now added to the list. In the 
main case, where A has 7 columns of n-tuples, all linearly independent, 
the columns provide a basis for any set of n-tuples. Hence any 
additional n-tuple, such as b, is automatically a linear combination 
of the n columns of A. This is not so in the present degenerate cases, 
since the p linearly independent columns of m-tuples are not enough 
for a basis. The extra m-tuple b may or may not be linearly dependent 
on the columns of A. 

If b is linearly dependent on the columns of A, written as the 
m-tuples u,, 1,4, ... U,, then scalar multiples ,, λα» ... A, exist so that 


AU Ate +... +A,U_,=b. Putting this relation between vectors in 
full detail: 


α.41λ + ἄγολα + eee + Bin\n = ὃ. 

αφγλι + αφολῳ +... + ξελῃ Ξε, 

Bmiry + Amore + sae QinnAn = Din 
Comparing (4) with (1), we see that the system of linear equations 
does have at least the solution v7, =),, %.=A,, ... Zn, =Ay. On the other 
hand, if ὃ is not linearly dependent on the columns of A, then no 


scalars A,, A., ... A, exist for (4) and the system of linear equations 
must be without solution. So: 


THEOREM: The system of linear equations Ax=b has A of order 
m xn and rank p, where p<m or p <n or both; and it has Ὁ as a m-tuple 
vector not linearly dependent on the m-tuple columns of A. The system 
is then inconsistent and there is no solution. 


The inconsistency in the system is reflected in the fact that one or 
more of the equations are not consistent with one or more other 
equations. There is something wrong with the whole formulation. 
As an example: 


(iii) w2,—2¢,+3%,=1 with A=||1 -2 3] of rank 2. 
2%, —- Lot+4x,=2 2-1 4 
ΔῈ %+ =a 1 1 1 


8] LINEAR ALGEBRA 397 


The rank of A follows from the fact that A=0 but | 1 -- 2 igi 

2-1 
Subtract the first equation from the second: «,+2,+2,=1. This is 
consistent with the third equation only if a=1, in which case we 
continue to seek a solution. If a has any other value, e.g. a=0, then 
the system of equations has no solution. In terms of linear depen- 
dence, the position is as follows. If u,, uw, and us are the columns of 
A, then they are linearly dependent by virtue of the relation: 


ὄψι: = 2.5 ἘΞ ΠΙᾺ = 0. 


One of the columns is dependent on the other two. If the vector of 
constants is b={1 2 1} corresponding to a=1, then Ὁ is also linearly 
dependent on the columns of A. In fact, b =u, =1 x u, +0 xu. +0 x Ug. 
But any other Ὁ such as {1 2 0} is not linearly dependent in this way. 

To continue, we suppose that b is checked to be linearly dependent 
on the columns of A and that we can look for a solution of the 
equations. The general position is that p is less than either or both of 
m and n. There are m — p rows of A dependent on the set of p linearly 
independent rows. This means that there are m — p surplus equations, 
dependent on or derivable from the others. These surplus equations 
are to be ignored. There are Ὁ — p columns of A dependent on the set 
of p linearly independent columns. To correspond, there are p 
variables to use and n—p surplus variables to which any values 
whatever can be assigned. Hence the system is to be viewed as giving 
only p variables (with the other ἡ — p assigned any values) by use of 
only p equations (the other m — p being ignored). So: 


THEOREM: The system of linear equations Ax=b has A of order 
mxn and rank p, where p<m or p<n or both; and tt has b linearly 
dependent on the columns of A. The system is consistent and the solution 
as got by finding p of the variables from p of the equations. The other n — p 
variables are assigned arbitrary values, and m — p equations are ignored 
as derivable from the p equations. 


Of the three kinds of degenerate cases, (a) has an equal number (n) 
of equations and variables but p<n, so that there are both surplus 
variables and surplus equations. In (b), there are fewer equations 
than variables; there may be no surplus equations but there must be 
surplus variables. In (c), there are fewer variables than equations and, 


398 LINEAR ALGEBRA [13 


though there may not be surplus variables, there must be surplus 
equations. Each case is illustrated in the examples: 
(iv) 2,—-22,+32,—1 which is the consistent case of (ili) above. 
2... -- %+4%,=2 
+ %+ %=1 
The third equation follows from the other two, by subtracting the 
first from the second, and it can be ignored. Fix a value for ὡς and 
write the equations: 
%,-2%,=1-32, and 2x%,-—x,=2(1 -- 244) 
giving 7, =1-—42, and x,=22, in terms of zy. 
(v) 24,+4%,+ %=-1 with A=] 241 
%4+2%,+2x%,= 1 122 | 
There is a surplus variable, but no surplus equation. To see which 
variable to take as surplus, and to get an assigned value: take twice 
the second equation and subtract the first, giving 7,=1. Hither 


of rank 2. 


equation then gives 2, + 2x%,= -- 1, i.e. either x, = -- (1+ 2.9) with zx, 
treated as surplus, or x,= -- (1 Ἐπ) with x, so treated. So, if 2, is 
assigned any values, the equations give: z,= -- (1 -Ἐ 1) and z,=1. 
(vi) 25, 2= O withA=|/2 1 of rank 2. 
%,+2%,= --ὃὃ 1 2 
%- %= 3 1 -ὶ 


There is a surplus equation, any one of the three. For example, the 
second equation is derived by subtracting the last from the first. If it is 
ignored, then 27, + 2,=0 and ἃ} -- ὧς ΞΞ ὃ solve to give: %,=1, 25: —2. 
This is the solution of the system of three equations, any one of them 
being ignored as implied by the others. 


13.9. Exercises 


1. Take the real numbers x as a vector space V over the field F’ of real 
numbers and show that it is a case of V,(), the vectors being n-tuples (n = 1). 
Deduce that the space has dimension 1 and that the real number 1 can serve 
as a basis. Why is this different from example (i) of 13.2? 

2. Show that, as a space V,(F) over the field F of real numbers, complex 
numbers have the pair 1 and ὁ as a basis: z=2 x1+y x1. 

3. Polynomials as vector spaces. Extend example (iii) of 13.2, showing that 
the set of all cubics is a space V,(F), the set of all quartics a space V,(f), and 
go on. Then show that the set F[z] of all polynomials is a vector space of 
infinite dimension. 


97 LINEAR ALGEBRA 399 


4. Consider the 3 x 3 linear transformation of 13.3. Show that, by elimina- 
tion of x, and x3, x, can be found in terms of the y’s, provided that A +0 where: 
A =Q4;(G99453 -- BagQg9) — ἀχε(α4γα35 — χε.) + αχε( γα — Daag). 

] Ξ 
Then: 2,= νιαμα,, - 34) — Ψε(σηιῶ8ς -- αχεας1) + Ψε(α γα, -- α,ᾳα,,)}. 


Find x, and 2; similarly. 
5. Apply the same method to the 2 x3 linear transformation, (4) of 13.3, 
and show that 


1 
v= Hawa, — By2Yg + (A243 -- attan)bs} 


l 
v= 4 - μι + ἄχχϑᾳ + (A1g0e1 — a,25)bs} 


provided that ὧς τεῦς is assigned and that A =d,,d9, —4@,.0,,+0. Examine 
similarly (5) of 13.3, showing that A must again be non-zero and that ¥,, Με 
and y; must satisfy a certain relation. 

*6. Jacobians. If u,, uz, ... U,, are each a function of a real variable x, the 
derivatives make up a row or column vector. Generalise to the Jacobian 


ou, 
7- at 
named after Jacobi (1804-51). 


, where each u is a function of n real variables. This matrix is 


™ 
7. Show that n inner products = a,b,,, for s=1, 2, ... ἢ, can be written 
r=] 


from 8 row vector (α:ας ... ad,,) and a matrix || b,, || of order m xn. Write the 
inner products for the same matrix and a column vector {c,¢, ... Cy}. 

8. Products of vectors and matrices. Use Ex. 7 to show that AB exists where 
A is a row vector of order 1 xm and B a matrix of order m xn, and where A_ 
is ἃ matrix of order m xn and B a column vector of order n x 1. Hence interpret 
and express in full: Ax for A=|| a,, || (r=1, 2,...m; e=1, 2,...n) and 
X = (X41, Xq, ... Ly). 

*9. In n-dimensional Euclidean space (generalising 8.4), show that the 
length | a | of a vector a is given by | a |? =a .a, and that the angle « between 
vectors a and ὃ is given by cos SAAT 

10. Negatwe and non-negative matrices. Two notations can be used for 
‘greater than or equal to’ applied to A=|| a,, ||: A2O meaning a,,>0 all r 
and s (all a,,=0 not allowed); AZO meaning a,,>0 all r and 8 (all a,,=0 
allowed). In both cases, A can be described as non-negative, i.e. whether 
A =0 is allowed or not. Show that it is still not true to say that non-negative 
A=0 and negative α «Ὁ cover all cases. (Note: some a,, Ὁ and some Arg <0 


is ὃ possibility.) 
10 10 [[-Ξ 
01 01 


11. Show that 
Show that A"=A x A x... x A (r times) is defined if and only if A is square. 
12. Suppose both AB and BA exist. Show that both products are square 
matrices but that they can be of different orders. Illustrate by multiplying 


1 O || and deduce that I=]? =I#=.... 
01 


400 LINEAR ALGEBRA [13 
-2 10 
-3 01 


of multiplication. 
13. Products of vectors. Two vectors x =(%1, Hq, ... Z,) and y =(Y1, Ya ... Yn) 


to get a 2 x2 ora 3 x 3 matrix according to the order 


1 2 
3 4 
5 6 


n 
have inner product x. y= Σ 2x,y,- In the matrix notation, show that z can be 
g=1 


written x as a column vector and x’ as a row vector (by transposing). Then 
show that the inner product ὦ. y =x’y =y’x, as matrix products. On the other 
hand, show that xy’ and yx’ are n xn matrices, one the transpose of the other. 

14. Illustrate that non-singular matrices can sum to singular matrices by 


showing that 21 ‘i -1 -2 {{-Ξ 1 -!|. 
-20} || o 2 =o 2 
15. Ἐπ} 2 1 || and aia 0 - ||, show that AB=BA =I (A and B 
-10 1 2 


inverse) and that A +.B =2AB. 
16. If A=||@,y ||, B=|| O59 ||, Ὁ ΞΞ || σις || for r=1, 2,...m, p=1, 2,...9, 
ᾳ --1, 2,...kand s=1, 2, ... n, show that AB and (AB)C exist and specify their 


j 
orders. By showing that & a,,6,, is the (r, q)th element of AB, deduce that 
p=1 


the general element of (AB)C is Σ Are? ars: Hence show (AB)C =A(BC). 
q=lp= 

17. If Ais of order m x k and both B and C of order & x n, show that A(B +C) 
exists and equals AB + AC. 

18. Products with unit matrices. Show that AB =IAB = AIB = ABI provided 
only that A conforms with B, but that I varies its order from one appearance 
to the next, except that the relations are quite unambiguous if A, B and I are 
all of order n xn. 

19. Under what conditions are the relations AO=OA=O0 valid for the 
product of a given matrix A and a zero matrix? 

20. Symmetric and skew-symmetric matrices. || a,, || of order n xn is sym- 
metric if G,,=Q,,, and skew-symmetric if a,,= —d,,, all r and s=1, 2, ... Ἡ. 
Show that the leading diagonal can consist of any elements in the first case, 
but must comprise all zeros in the second. If A’ = A, show that A is symmetric ; 
if A’ = — A show that A is skew-symmetric. 


e . e Ο . Φ 
*21. There are ἡ» partial derivatives = for a function u of n real variables. 
T 


Write second-order partial derivatives, illustrate from actual functions that 
δ 
OX, OL, 


they are symmetric and hence write the symmetric matrix H = 


This is called a Hessian. 

22. If A is of order n xn and if B =AA, show that | B | =A" | A | and deduce 
that the singularity (or otherwise) of A is not affected by scalar multiplication. 
23. Division of matrices, Show that non-singular matrices of the same order 


9] LINEAR ALGEBRA 401 


can be divided but that division is non-commutative: AB- is one division of 
A by B and B-'A another, where in general AB-!+ B-1A. 
24. If the scalars 4,, A3, ... λῃ are all non-zero, show that 
A=|| A, 0... 0 has A =),A,...A, and A-!=|| 1/A, 0... 0 
0 A, ... 0 0 1/A,...0 
0 0... A, 0 0... 1|λῃ 
*25. Orthogonal matrices. A=|| a,, || is orthogonal if A-1=A’, i.e. if AA’ =I. 


n 
Write the (r, s)th element of AA’ as an inner product and show that Σ a, =1 
| t=1 


n 
for each r and that Σ a,,a,,=0 for each r and 8 (r#s). Interpret these relations 
t=1 


and illustrate with A= cos 6 sin @ ||. 
-sin θ cos 0 | 
26. Establish that 

2 13/], || 213 || and |} 2 1 are each of rank 2, 
1 283 123 2 1 
1 -10 1 -1 

but that both 1 -2 3 || and || 1 2 3 || are of rank 1. 

Pee we bee 


*27. If A and B are of given order m xn, show that the relation ‘A and B 
have the same rank’ is an equivalence relation which serves to partition the 
set of all such matrices into equivalence classes according to rank. 

28. Cramer’s rule. If A is non-singular and of order n xn, show that Ax =b 


n n 
has solution z, = Σ bp Ary / 2uda,A,, for s=1, 2,... 
r=] r=1 


where A,, is the co-factor of a,,. (To prove: multiply the n equations by 
A 1, Ags, ... Ang respectively, add and use (3) of 13.6.) Apply the rule to 
example (ii) of 13.8. The rule is named after Cramer (1704—52). 

29. In the light of the results of 13.8, re-examine the solution of 

Qy1%y+Ay9%,=b, and Gq1%1 +Age%g τεῦς. 

Show that the degenerate cases are of two kinds: 

(1) se el δι where the equations are inconsistent (no solution) 

Ge, Ase Og 


(2) ott =u -ἰ where the equations are dependent and where a solution 
21 22 2 
is obtained for x, (given x.) or conversely. Interpret in terms of linear de- 
pendence between the columns of A =|| a@,; a, || and the vector b=|| 6, |i. 
Ger Ase bs 
30. Homogeneous equations. Consider Ax=0O as a set of n equations in ἢ 
variables, where A of order n xn is given and where x ={x,2, ... 2,}. Assign a 
value of z,, solve for x, 2g, ... Z,—1, using the results of 13.8 (degenerate cases) 
to show that the solution can be achieved if A is singular. Deduce that Ax =O 
has solutions other than x, =a, =... =z, Ξε only if A is singular and that they 
are not unique. 


402 LINEAR ALGEBRA [13 


31. Illustrate the result of Ex. 30 by showing that 
αὶ —20,+32,=0, 27,-x%,+4%,=0 and %,+%,+2s, =0 
has solutions = =3 =2. Deduce that there are unique ratios for the variables 
satisfying the three homogeneous equations in this case. Such a result is true 
in general when A is of order 7 x and rank ἢ — I. 

32. Inversion of a linear transformation. The linear transformation y = Ax, 
where A is of order m xn and rank p, is from the n-tuple x to the m-tuple y. 
Adapt the results of 13.8 to show that the inverse is x = A~1y if A is square and 
non-singular (main case) and that otherwise not all the variables can be used 
in inversion (degenerate cases). If p<m and/or p <n, show that the transforma- 
tion is consistent for inversion only if y is linearly dependent on the columns 
of A. Further, if m<n, show that some variables of x must be assigned before 
inversion; if n<m, show that the y’s must satisfy one or more relations. See 
Ex. 5 above. 

*33. Orthogonal transformations. The n xn linear transformation y = Ax is 
orthogonal if A is orthogonal (Ex. 25). Show that x =A’y is the inverse, interpret 
and illustrate with A = cos θ sin θ ||. Orthogonal transformations of V,,(/’) 
- βίῃ 6 cos 0 | 
into itself have the property that they preserve lengths; the 2 x 2 example 
here corresponds to a rotation of axes in two-dimensions. 

#34, Full linear groups. Show that the linear transformations y = Ax form a 
group under x (successive applications) for all non-singular matrices A of 
given order n xn. Deduce that this group is isomorphic with L,,(F'), the group 
of non-singular matrices. Hence, algebraically, non-singular transformations 
and non-singular matrices are interchangeable concepts. 


CHAPTER 14 


LINEAR SYSTEMS 


14.1 Linear algebraic systems. Linear algebra deals with sets which 
have the structure of a vector space and with two operations: sums 
and scalar products. A third operation, giving the products of 
elements of a set, may also be defined; but it is incidental and 
certainly not necessary. Concentrating on sums and scalar products, 
consider a linear form: u=a,X1 + ἀφο, +... - ἀρὰ... The a’s are scalars, 
taken here as real numbers. The 2’s can be vectors (e.g. m-tuples) in 
which case Ὁ is a similar vector. However, take the x’s in the present 
context as real variables so that w is also a real variable. The vector — 
notation is always useful ; for example, if a is the vector (a1, ας, ... Gn) 
and x the vector (%,, 22, ... X,), then u is the inner product a. 2. 

What may be regarded as the essential feature of linearity in 
linear algebraic systems? We can develop an additive property 
which, at least, can lay claim to be of the essence of linearity. 
Though the property is perfectly general, the particular case of linear 
forms in two real variables is considered to simplify the exposition. 
Take first a pair of linear forms: 


U=A,X+by and v=agr+bsy ...ccccccececees. (1) 
This is a linear transformation from pairs (x, y) to pairs (u, v). Then: 
THEOREM: If (x1, ¥;)—>(u1, V1) and (ως, Y2)—>(Us, V2) under (1), then 
(Ay + Age, AyYy + AgY2)—>(AyUy + Agta, Ay + Ave) 
for any scalars A, and d, whatever. 


The additive property here is that, once two images are found under 
the transformation, other images follow as the sum of the given two, 
with any multiples λ; and A, we care to take. The proof is immediate: 


We know: Uy =4,%,+b,y, and u.=a,2,4+ by. 


So: Ay + Ags =A, (A421 + δ...) - λεία ας + δι.) 


Oo A.B.M. 


404 LINEAR SYSTEMS [14 


7 = χίλιαι + Age) + διίλιψα + Aaya). 
Similarly: λιῦ, HA’, =Ga(Azay + Age) + δείλιψι + AgYe)- Q.E.D. 

Next consider a single equation, a linear form in two variables 
equated to zero. Write it in two ways, without and with an additive 
constant: 

OG 4-08) =O a chic ssd vce νυ ξοοςὀςε ὡς ες νον ες (2) 

GLE EOYAOCHO" <okisaseassecas add sewesveonness (3) 

Here (2) is described as an equation in homogeneous form and (3) as 

one in non-homogeneous form. Moreover, given any equation (3), the 

corresponding homogeneous form (2) can always be written by 
dropping the constant c. 

Suppose (2, y,) and (x2, y,) are any two pairs satisfying the 
homogeneous form (2). Then it follows at once that the linear 
combination: 

λιία,, Y1) Ὁ λεία», Yo) i.e. the pair (Ayr, +Ag®a, λυσα + λα) 
also satisfies (2), for any A, and A, whatever. (‘The proof is as before.) 
In addition to these two pairs satisfying (2), suppose we have a 
particular pair (, 7) which satisfies (3). The striking result, then, is 
that 
λιία,» Yr) + λεία,» γε) + (ὦ, δ) 
i.e. the pair (λιαι +Ag@y t+, AY, +AYo + ¥) 
also satisfies the non-homogeneous form (3). The proof is again by 
substitution: 
We know: ax,+by,=0, ar,t+by,=0 and az+by+c=0. 
Substitute 2=A,7,+A%et+% and y=Ayyt+Awyet¥ in (3): 
G (Agr + λρῖς + 4) +O (AqY1 λον τ 5) +e 
=A, (ax, + by,) +A,(ax, + by2) + (az + δῦ +c) =0. 
The result obtained is: 
THEOREM: A solution of ax+by+c=0 ts given by: 
=) tA, +h and y=AYytrAwYet+9G (for any dr, and dr.) 
where (x1, ψι) and (ας, Y2) both satisfy ax+by=0 and where (ὦ, ἢ) 
satisfies ax +by+c=0. 
The additive property here is that any two solutions of ax +by=0 
(added with any multiples) and any solution of az + by +c=0 can all 


17 LINEAR SYSTEMS 405 


be added together for a solution of the linear equation ax + by +c =0.* 

The results obtained are, in fact, quite general and not confined to 
the particular case of two variables considered. Linear forms have an 
additive property: if particular solutions are found, then a general 
solution can be written by adding the particular solutions, with any 
multiples whatever. 

There is one line of thought which is often followed to the con- 
clusion that linearity is a very special and limited case. Suppose the 
real variable « depends on one or more other variables, e.g. u as a 
function of z, or as a function of x and y. Generally, we write u =f (zx) 
where the form of f is reflected in the shape of the curve which 
represents the relation graphically; or u=f(x, y) and the form of f 
shows up in the shape of the corresponding three-dimensional surface. 
Innear functions such as u=ax+b or u=ax+by +c are indeed very 
special cases. They correspond to taking a line instead of a curve, a 
plane instead of a surface. The contrast is between the linear u =az +b 
(a line) and the quadratic u =az* + bx +-c (a parabola) or higher-order 
polynomials. These are all, in a sense, approximations to a general 
function u=f(x) which is ‘well-behaved’ enough to have derivatives 
of all orders, i.e. approximations appropriate for a small neighbour- 
hood of z around a particular value. These approximations are easily 
got by Taylor’s series. For small z (around x =0), we have: 


70) Ξ: (0) +f" 0) +F"0) By τ... 


Approximate by neglecting x? and higher powers and 


F(x) =f (0) +f" (O)x, 

1.0. u=ax+b, with a=f'(0) and b=f(0). If x3 and higher powers are 
neglected, then u=axz?+bx+c, with a=4f’'(0), ὃ -- (0) and c =f (0). 
The linear function can, therefore, be regarded as the most severe 
approximation to a general function. 

The linear relation is, however, not quite as limited as this. If w is 
a function of x, we can express log u as well as u in terms of log x as 
well as x. In a graphical representation, as an alternative to a graph 
on natural scales, we can use semi-logarithmic and logarithmic 
graphs (as employed in statistics). The linear function u=ax+b 


* An interpretation: the relative position of two points on a line is required to 
determine the line’s direction, and then one point is enough to fix its position. 


406 LINEAR SYSTEMS [14 


(a and ὃ given constants) is a line on natural scales. The function 
u=be gives: log u=ax+log ὃ. This is linear on semi-logarithmic 
scales; log u is a linear function of x and u=be” is shown by a line 
when log u is plotted against x. Further, the function u=6z* gives: 
log u=a log x + log ὃ. This is linear on logarithmic scales; log wu is 
linear in log x and wu=bz* is a line when log u is plotted against log =. 
Consequently, all the functions u=ax +b, u=be* and u=ba* can be 
regarded as linear, and between them they cover a considerable 
range. In particular, while w=ax+6 expresses a constant absolute 
rate (a) of growth of w with respect to 7, w=be* is a constant pro- 
portionate rate (a) of growth (12.3 above). They are both linear, as 
shown in the constant rate of growth. 

It remains to explore an extension of the idea of linearity, one which 
is suggested by the last remark. We turn from a linear form in one 
or more variables to a consideration of linearity in the growth of one 
variable in relation to another, and in particular to variation over 
time. The emphasis is on the time-path of a dynamic variable as 
opposed to a static value. 


14.2. Linear differential equations. The relation y=be* represents 
growth at a constant proportionate rate, e.g. it shows the growth of 
a sum of money at various times (x years) when interest is com- 
pounded continuously at 100a per cent per year. Consider the 
derivation of the relation. We are given only one fact: y grows at the 
given proportionate rate a. If D is the operator for a derivative, the 
given fact is: 


1 
ak a Ol DIGG Y =O ουροςον εις οἰρχορω  ρεφε (1) 
The relation between y and x is then to be found as an anti-derivative 
or integral: 
log y=fadzx+constant = ax + constant 
giving y =e +constant — econstant eax 
1.6. y=be* (constant): ὐν ιν εὐ νο νους νος (2) 


The “δ᾽ here is an arbitrary constant, but it has an interpretation as 
the value of y when x=0. If we are given an extra fact, that y=y, 
when x =0, then the relation is unambiguous: 

y=ye" (Yo initial value GE Fics sesinceatentacveek (3) 


2] LINEAR SYSTEMS 407 


For example, this shows the amount £y of an initial sum £y, at the 
end of x years at continuously compounded interest of 100a per cent 
per year. 

In reviewing this problem, we note that (1) is a ‘differential 
equation’, an equation including the derivative Dy as well as y. We 
find a ‘solution’ by integration. In this case, it is (2), where ὦ is some 
‘arbitrary’ constant. Or, it is (3), where an additional fact is known, 
the ‘initial’ value y, at =0. The problem can be generalised. 

We seek to express a variable yas a function of another variable x. 
We are given only a relation between x, y and various derivatives: 


Derinirion: An (ordinary) differential equation is some relation : 
F (x, y, Dy, Dy, ... Dry) =9 
and its order n is that of the highest derivative Dvy included. 
Further, attach the label ‘linear’ to a particular case: 


Derinirion: A differential equation of order n is linear tf it 8 of the 
form: 
Dry + a,(1)D"-ly + ... + ἀρ. χα). + ἀρ(αὴν = (2) 
where a,(x), ... Gy—4(X), @,(%) are specified functions. It is linear with 
constant coefficients 1/2 18: 


Dry +a,D*-ly +... αν + any = (2) 
where Gy, ... An—1, Gy are specified constants. 


The notation here is in terms of the operator D. Differential equations 
are often written with the alternative notation τ for D; the linear 
differential equation with constant coefficients is then: 

dy d™—ly dy 

dx 7 sa, soit ~ + ὦ,,.--1 dx J aay = =¢ (2). 

To solve a differential equation is to find the form of the function 
y=f (x) which satisfies it. One simple case is illuminating, the first 
order linear differential equation in which the term in y 18 absent: 
Dy =4 (x). The solution is known to be: y=Jf¢(x) dx + constant. In a 
sense, solving a differential equation is a generalised form of finding 
an integral or anti-derivative. For this reason, the solution is often 
termed the integral of the differential equation. 

In exploring the general nature of the solution, we can first enquire 
why the first order equation Dy = ¢ (x) has just one arbitrary constant. 


408 LINEAR SYSTEMS [14 


If we are given y=f(x; A), including an arbitrary constant A, then 
Dy =f'(«; A). The constant A can be eliminated between y= f(x; A) 
and Dy=f'(x; A) to give some relation between x, y and Dy, i.e. to 
give a first order differential equation. One derivative gets rid of an 
arbitrary constant; integration brings it back again. Similarly, given 
y=f (x; A, B) with two arbitrary constants, then A and B can be 
eliminated between y, Dy=f’ (x; A, B) and D*y=f’(x; A, B), and 
the result is a second order differential equation. In general, if ἢ 
arbitrary constants are included in the relation of y to 2, then they 
are eliminated in writing some nth order differential equation. 
Conversely, we expect to find n arbitrary constants in the general 
solution of a differential equation of order n. 

This has an important consequence; there must be all kinds of 
particular functions satisfying a differential equation. For example, 
suppose that the general solution of a second-order equation is 
y =Af,(x) + Bf,(x), where A and B are arbitrary constants. Then 
y =f,(x) is a particular solution (A =1, B=0) and y=f(z,) is another 
(4 =0, B=1); but so are y=f,(x) + f,(x) and many others. It is of no 
use checking (say) that y=/f,(x) satisfies the equation and leaving it 
at that. In this way a solution is obtained, but not the general 
solution with the appropriate arbitrary constants. 

The next step is to find the form of the general solution of a linear 
differential equation. Suppose the equation is of order n: 


Dry +a,D* Vy +... + Gn ἀν + Any =P (1) ...«ννννννννος (4) 
and write the corresponding homogeneous form: 
Dry +a,D"-y +... + On pDy + gy =O «ννννννννννννον (5) 


where the a’s are given functions of x (or constants in the particular 
case). It can now be shown that the same additive property holds for 
linear differential equations as for linear algebraic equations (14.1). 
This is the justification for the term linear. Equations like (4) or (5) 
form a linear system. 

First, suppose that y=y,(z) and y=y,(r) are known (e.g. by 
checking them in the equation) to satisfy the homogeneous form (5). 
Then y= A,y,(x) + A,y,(z) also satisfies (5), and for any constants 
A, and A,. For, on substituting: 

D"(Ayy1 + Ayo) +4,D"-“(Ayy, + Agfa) Ὁ... 
+ On—1D (Ay, + Asys) + n( Ary: + 44.) 


2] LINEAR SYSTEMS 409 
=A,(Dy, +a,D"-ly, + coe 
+ Ay—DYy, + On) +A,(Dry, ται] Yo +... Ἔα, Τὸν} + Ys) 
=0 since y, and y, are solutions. 


Next, suppose that y=7 (zx) is known to satisfy the non-homogeneous 
form (4) and y,(z) is some solution of the homogeneous form (δ). 
Then y=y;,(x) +4 (2) also satisfies (4). For on substituting: 


D"(Yyy ἘΦ) -ταιρρητι(ν, +9) Ἑ... Ἐὰ,, δ (Yr +9) + Only +9) 
= (Dy, +a,D°-ly, +... +d,-1~Dy, + any1) + (DG +a,D*-9 +... 
+ Gy Di + On) 
=¢(x) since the first bracket is zero and the second ¢ (2). 
The two results can be combined and developed to establish: 


THEOREM: The general solution of a linear differential equation of 
order n is: 

y =Ayy;(%) + Agy(%) +... +AnYn(%) +9 (%) ..«ονὐννννννς (6) 
where y;(), yo(x), ... Yn(x) are n different particular solutions of the 
homogeneous form, where 7(x) is any particular solution of the non- 
homogeneous form, and where A,, Az, ... A, are arbitrary constants. 


Here ἢ (x) is called the particular integral and the linear combination 
of the functions y,(x), y,(x), ... ¥n(z) is the complementary function. 
It is to be stressed that the » functions in the complementary 
function must be all different and in the genuine sense, excluding 
(e.g.) functions which differ only by constants. To ensure a general 
solution, » particular and different solutions of the homogeneous 
form must be obtained and the linear combination of them written; 
the addition of any solution of the original differential equation 
completes the solution. 

Given only the linear differential equation itself, there is not a 
unique solution. There are n arbitrary constants to be assigned; by 
allotting various values to them, we get a range of particular solu- 
tions. Suppose, however, that something else is given, i.e. 7 initial 
values such as: 


Y=Yo, Dy=y,,, D2y=y,", ... Do y=y,"-9 at x=0. 


Then, on substituting the general solution (6), » equations are ob- 
tained in the n arbitrary constants and the n initial values 


Yo: Yos Yo > --- Yo". 


410 LINEAR SYSTEMS [14 


From these, in general, the arbitrary constants can be expressed in 
terms of the initial values. The solution (6) then becomes unique, 
given the initial values. This is the usual form in which the problem 
is presented. The solution of a linear differential equation of order n, 
subject to n initial conditions, is unique. It is (6) with the values of 
A,, Ay, ... A, expressed in terms of the given initial conditions. For 
example, the differential equation (1), or Dy —ay=0, is linear and 
homogeneous, of first order, with constant coefficients. Its solution 
is (2) where ὃ is an arbitrary constant. Given the initial condition 
y=Y,) when z=0, the solution is (3) and unique. 


14.3. Solution of linear differential equations. The practical tech- 
niques for solving specific differential equations are many and various 
and they are not to be pursued here. Even for linear equations, it is 
one thing to write the solution in the form (6) of 14.2 but it is quite 
another matter to spell it out. The problem has been simplified, to 
the extent of transforming it into the problem of finding, first, the 
n particular constituents of the complementary function, and then 
the particular integral. This residual problem is far from easy. This 
is particularly so when the linear equation does not have constant 
coefficients. Indeed, in this case, it sometimes happens that no 
particular solution of the homogeneous form can be found in terms 
of known functions, i.e. that the equation defines a new function. 
This is a natural extension of the method of defining new functions 
by means of an integral (a first-order linear differential equation) as 
pursued in Chapter 12. An illustration is given in 14.9 Ex. 5. 

The case analysed here is the particular one of a linear differential 
equation with constant coefficients. It is possible in this type of 
equation to complete the solution in general terms, but only for the 
complementary function with » arbitrary constants. The question 
of finding the particular integral is left open; at best it is something 
of a hit-or-miss affair. Enough is done here to establish two things 
of general interest and practical importance. One is that the solution 
of a linear differential equation (with constant coefficients) is not 
only of the same additive kind as that of a linear algebraic equation, 
but is also obtained in practice by solving an algebraic equation. 
The first step in finding the complementary function of a linear 
differential equation is to reduce the equation to an algebraic (poly- 


9] LINEAR SYSTEMS 411 


nomial) equation. Since we can always solve the latter, in one way 
or another in practice, we can also solve the differential equation. 

The other point is that the solutions of linear differential equations 
quite often, indeed usually, include oscillatory components of the 
form of the circular functions of 12.5. This is so of equations of as 
low an order as the second. Oscillatory movements appear in many 
problems in the natural and social sciences. The problems are framed 
by specifying the differential equations satisfied by the variables and 
the solution of the equations shows the oscillatory nature of the 
movements of the variables. 

The first-order differential equation with constant coefficients is 
easily handled. It can be written: Dy+ay=¢(zx). To obtain the 


complementary function, which has only one term, write the homo- 
geneous form: Dy = — ay. So: 


Diog y=, Dy=-a 


1.6. log y= -- ax +constant 
and y = Ae-™ 


is the complementary function with its single arbitrary constant A. 
To complete, we must find a particular integral 7 (x) of the original 
equation Dy + ay=¢ (x). This is usually a matter of trial and error, 
according to the form of ¢(x). The general solution of the original 
equation is then: 
y = Ae +. G(x). 

In the particular case a=0, the complementary function is simply 
y=A. To this, the particular integral 7(x) is to be added. Directly, 
the equation is: Dy=¢ (x) and the solution is y={¢(x) dx+A. The 
particular integral is just the indefinite integral of ¢(x).. Another 
example illustrates: 

(i) Dy +y=e*, with complementary function y = Ae-. 
As a guess, try g=ke* as a particular integral. Substituting 
ἡ =—Dg=ker: 

DG+y=2ke =e" if k=}. 

Hence ᾧ =3e* and the complete solution is y= Ae-* + Le*, 

The second-order differential equation with constant coefficients 
and its corresponding homogeneous form are: 


Dy + aDy + by =4 (α).....«οννννννννννννννννννον (1) 


Ο2 Α.Β.Μ. 


412 LINEAR SYSTEMS [14 


and Diy + aDy + by =0........cccccreceerecscecoeees (2) 
The particular integral y=7 (x) is to be obtained, somehow, from (1). 
The complementary function is y = A,y,(x) + A,y,(x), where y, and y, 
are two different particular solutions of (2). It does not matter how 
we obtain y, and y,, as long as we get them. A trick suggested by the 
solution of the first-order equation, and adopted with no apology, is 
to try y=e* as a solution of (2) and to see whether we can find two 
different d’s. In (2), substitute y=e*, Dy =Ae* and D*y =A?e**: 


(A2-+ aA +b)e* =0. 


Cancel e’*>0 (for all x) and get the auailary equation: 


MON O20 \ wenadvnisseorecestaseadorles ..(3) 
with two values of A: | 
λι, δεξί --α- ψ(α3 — 4D)}. .«νννννννννννννννννενενι (4) 
The complementary function, the solution of (2), is then: 
USA sO PALO ariemeiscassdesnessotiun (5) 


provided only that the values of A given by (4) are different. The 
complete solution of (1) is then written by the addition of the 
particular integral ᾧ (x) to (5). 

This remarkable result implies that the solution (complementary | 
function) of the differential equation (2) is got simply by replacing it 
by the polynomial (algebraic) equation (3). The second-order (2) 
gives the quadratic (3), the coefficients being the same. The result 
generalises to a differential equation of any linear order with constant 
coefficients; if (2) is of order n, then (3) is a polynomial equation of 
nth degree. The complementary function (5) also contains n terms, 
corresponding to the n roots of the auxiliary equation. There is 
nothing more to be said in general. In practice, having reduced the 
differential equation to the algebraic auxiliary equation, we con- 
centrate on getting the n roots we know the auxiliary equation has. 
The problems left to be tackled are problems of detail. 

Pursuing the detail of the second-order equation, we distinguish 
the three cases of the roots (4) of the quadratic auxiliary equation. 


Case: a*>>4b. The roots 4, and A, are real and distinct. The solution of 
(1) is: | 
y=A,e* + Ane +9 (2). 


3] | LINEAR SYSTEMS 413 


This is very similar to the solution of the first-order equation. The 
only difference is that there are two exponential terms instead of one. 
These terms are increasing or decreasing according as the 2’s are 
positive or negative. For example, if 4,>0 or A,>0, then y>oo; if 
A, <0 and A, «Ὁ, then y approaches 7 as x—> οὐ. 


Case: a? =4b. The roots A, and A, are real and equal. Here there is a 
residual difficulty ; since there is only one A, there is (as yet) only one 
term in (5) and we need two. The single A= — 4a = — ,/b, given by the 
coefficients of the differential equation. The latter can, therefore, be 
written in homogeneous form: 


D*y — 2ADy + r7y =0. 
Another trick is needed to complete the solution. The one which 
works is to try y=xe”. On substituting y=xe*, Dy =(1+Azx)e* and 
D*y =X(2+Ax)e*: 7 
D*y — 2XDy + λὲν = {A(2 +Ax) -- 2A (1 +Ax) +A%x}e* =0. 
Hence, the second particular solution is y=xe to add to the first, 
y =e, The solution of (1) is: 
y=(A,+Agr)e* +9 (x) 
and this does not differ substantially from that of the first case. 


Case: a*<4b. The roots A, and A, are conjugate complex. This is the 
case where the solution of (1) is oscillatory. It merits separate 
examination (14.4). 

Meanwhile, the following examples illustrate: 

(11) δὲν + 3Dy + 2y=0 with auxiliary equation 

2+ 3A4+2=(A+1)(A+2)=0. 
The solution is: y=A,e*+A,e-* =(A,+A,e*)e-* 0 as 2-00. 
The equation D*y + 2Dy+y=0, with auxiliary equation 
A24+2A4+1=(A+1)?=0, 
has a rather similar solution: y=(A,+A,x)e*—0 as 2-00. 

(111) Dy —-y+xz=0. The complementary function is the solution of 
D*y —y =0, with auxiliary equation A? — 1 =0,i.e. itis y =A ,e* + A,e. 
As a particular integral, try y=kx. On substitution: —kxr+x=0, 
1.6. k=1. Hence the solution of the equation is: y= A,e7 + A,e* +2. 


414 LINEAR SYSTEMS [14 
If the problem is framed: find the solution of D2y —y +2 =0, subject 
to initial conditions y=y,, Dy=0 at x=0, then 
Yy=A,+A, and 0=A,-A, (from y and Dy at x=0). 
Hence, A, =4y, and A,=}y,. The unique solution is: 
y=4hy(e* - 655) +2. 

A more general problem arises when there are several variables 
(y, 2, τι, ...), each a function of x, and subject to several simultaneous 
differential equations. As a simple case, which can be generalised a 


little (14.9 Ex. 10), consider a pair of first-order, linear, homogeneous 
differential equations, in two variables, y and z, each a function of x: 


υτεαμ ταῦ aNd D2=Ag YAO  ceercececcecese (6) 
where the a’s are given constants making up a matrix A=|| a,, ||, 
and giving a determinant A =| a,, | =@,@99 — @1o4o1, for r and s—1, 2. 
The variable z can be eliminated from (6) by using the first equation: 


1 
Z=— (Dy - au) and so Dz a (Diy - ἀμ) 
A192 αι 


and by substituting in the second equation: 
1 a 
a (Dy Ξ a,Dy) Any + a'2( Dy Ξ any) (032% 0). 
Ayo 19 
Hence y satisfies the second-order linear differential equation: 
Dty — (ayy +qq)Dy + AY=0 (yg) .....νννννννος (7) 
Equally, by using the second equation to give y and Dy and by 
substituting in the first equation, we find that z satisfies precisely 
the same equation: 
D®z — (Ay, +g9)Dz+Az=0 (α;,:50). 
This is obvious enough from the symmetric way in which the a’s 
appear in (7). Hence the movements of y and z as x varies are 
identical, apart from a multiplicative constant (4) which leaves (7) 
unchanged. The solution of (7) is: 
y=A,er* + A,e** (Α, and A, arbitrary) 
where A, and A, are the roots of A?—(a,,+a,.)A+A=0. The first 


; 1 ae 
equation then gives: z ue (Dy — ay). On substitution for y: 
12 


\ 


3, 4] LINEAR SYSTEMS 415 


2=h,A yen + yA ger (4) == and k=) ' 
Q12 A192 
The fact that y and z follow essentially the same path (e.g. over time 
x) 18. a consequence of the assumption at the outset of the linear 
forms (6). 

The paths of y and z are similar exponential growths (or declines) 
if the auxiliary equation of (7) has real roots, i.e. if A<}(a,, +do.)°. 
Otherwise, if the auxiliary equation has conjugate complex roots, 
then y and z have similar oscillatory paths. 

As a particular case of (6), suppose that the matrix A is singular, 
1.6. that A =d4,49. — 4,0; =0. This means that the ratio a,, : do, is 
the same as the ratio a,, : 2. The auxiliary equation of (7) gives 
A,=0 and A, =(a,, + dg.) =p (say). The solution is of the form: 


y=A+ Be 
and z=k,A+k,Be* 


As x->0o0 (e.g. as time goes on), y and z both behave like e* and 


᾿ A and B arbitrary. 


ἜΣ (constant). 


14.4. Oscillatory movements. We return to the linear and homo- 
geneous differential equation D*y+aDy+6=0 in the case a?<46 
where the auxiliary equation \?+a+b=0 has conjugate complex 
roots ${—a+1,/(4b —a?)}. A convenient notational change for the 
structural constants (a and δ) of the equation is in order. Write the 
conjugate complex roots as «+iw, where « and w are constants 
given in terms of a and ὃ, i.e. « and w are (alternative) structural 
constants of the equation. The relations between the alternative 
constants are: a= — 4a and ὦ =3,/(4b — a?) Hence: 


a=-2e and b=a?+w?. 


The differential equation can be written in terms of the new structural 
constants: 


Dy - 2aDy + (a? ἝἜ w*)y =0 
and the solution can be written 
YA jee Vs A eee © sca Seres -- (1) 


for arbitrary constants A, and A, which can also be complex values. 


416 LINEAR SYSTEMS [14 


This solution, though neat, can be developed into alternative 
forms, which are of greater use in practice, and which do.not involve 
complex values. The differential equation is in a real variable y and it 
has real coefficients involving « and w. The complex values in (1) are 
merely an intermediate step to a real solution. 

The results of 12.6 and 12.7 come into play in this development. 
Write (1) as: 

y = et A ete ΚΞ Α.,.6- ἰωα) 
=e*{A (cos wx - ὃ sin wx) + A,(cos wx -- ὃ sin wx)} 
1.6. y=" (B, 008 ὡς -ἰ By SIN OX) asd iis caress diyavciiesedsivenes shoes. (2) 


where B,=A,+A, and B,=1(A,-A,) are also arbitrary constants. 
We now have y in its proper real form so that B, and B, must be real 
constants. (It follows that the original constants A, and A, are 
conjugate complex values.) The solution (2) shows the real path of 
y, a8 x changes, more explicitly than the equivalent (1). 

7 A further shift in the notation for the arbitrary 
constants can be made. The nature of the change 
is seen in Fig. 14.4a; it is, in effect, a switch from 
the Cartesian co-ordinates (B,, B,) of a point P 
to the polar co-ordinates (A, «). Write: 

B,=Acose and B,=A sine 
which is the same thing as writing: 
A=,/(Bi +B) and «=tan-1(B,/B,) 
(see Appendix A.9). 
Hence, since B, and B, are arbitrary real constants, so are A and ε. 
Then (2) becomes: | 


y=e*(A cos wx cose + A sin we sin e). 
By use of the addition formula, (2) of 12.5: 
YAS 608 (WF -- εὐ) ...ονννννννννονονονονοσονον (3) 


which is the most concise and convenient form for the solution of the 
differential equation. There are four parameters in (3) and it is 
important to distinguish between them. Two of them, « and w, are 
given by the structure of the differential equation taken; they 
correspond to the inherent or structural variation of y. The other 
two, A and, are arbitrary constants to be given by initial conditions; 


4] LINEAR SYSTEMS 417 


they correspond to the ‘accidental’ variation of y, arising because it 
starts off in a particular way. 

The function (3) is a generalised circular function, called a sinu- 
soidal function, taking its shape (when represented graphically) from 
the regular and symmetric oscillation of the cosine function (12.7 
above). The dependence of the oscillation on the four parameters 
needs careful examination. Write (3) in two parts: 


y=uv where uw=Ae** and v=cos (wx -- ε). 


Then v is the oscillatory term and u simply serves to amplify or 
damp the cycle. If «=0, then «=A and the oscillation of v is every- 
where amplified in the ratio A :1. If «<0, then w=Ae decreases 
exponentially to zero as x increases, and the oscillations of v are 
diminished as x increases; this is the case of damping. If «>0, then 
u = Ae increases exponentially as x increases and the amplification of 
v increases to match; this is the anti-damping case. The oscillatory 
term v =cos (wx -- ε) is represented graphically in Fig. 14.40; it is the 


v v=cos (wx-e) 


Fia. 14.46 


graph of the function cos x with the x variable re-scaled and measured 
from another origin. A peak cos x=1 occurs where x=0; a peak 
cos (wx—e«)=1 occurs where x=c/w. Hence the phasing of 
Vv =COS (wx -- ε) is fixed by the peak v=1 at x=e/w. The re-scaling 
of the x variable is such that, whereas cos x completes a cycle in the 
interval 0<a< 27 and then repeats in every interval of 27, cos (wa — ε) 
has a repeating cycle over an interval of 27/w. This is the period of 
v=cos (wx —e). The phase and period are indicated graphically in 
Fig. 14.46. 

We now combine the two terms of y=wv= Ae“ cos (wx-—e). If 
«=0, the cosine cycle of Fig. 14.46 is amplified by the amplitude A; 
its shape is unchanged except that it oscillates between +A instead 


418 LINEAR SYSTEMS [14 


of +1. If «<0, the cosine cycle is progressively diminished in 
amplitude as x increases, according to the damping factor ( -- α), the 
case illustrated in Fig. 14.4c. Similarly, if «>0, the cosine cycle is 


Case: «(-0,2) <0 
΄ damped oscillation 


Fia. 14.4¢ 


progressively amplified, according to the anti-damping factor «. The 
variation of y is similar to that of Fig. 14.4c, except that the oscilla- 
tion is not damped but rather explosive (or anti-damped) as z 
increases. Hence: 


THEOREM: The circular or sinusoidal function y = Ae“ cos (wx -- ε) 

has a symmetric oscillation of the cosine form with the features: Period 
T given by πίω; Phase given by a peak at x=e/w; Amplitude given 
initially by A; Damping indicated by ( -- «). 
The period 27/w represents the interval of x over which a complete 
cycle of y takes place; the cycle is then repeated in each successive 
interval of length 27/w. An alternative expression of the same feature 
is by specification of the ‘frequency’ of the oscillation, i.e. the 
number of times the cycle repeats in a unit interval of x. The 
frequency is w/2m cycles per unit of x, or w cycles per interval 22 of 
x. Hence, we can speak of the frequency w of y= Ae cos (wx -- ε) 
as compared with the unit frequency of the cosine function cos x. The 
frequency is the number of complete cycles achieved by the function 
in an interval 27 of x. The frequency and period are reciprocal to 
each other; it is useful to use period when the cycle is long and 
frequency when it is short. 

To summarise: the linear differential equation 


D*y — 2aDy + (a? + w*)y =0, 


4, δ] : LINEAR SYSTEMS 419 


where « and w are structural constants, has the oscillatory solution 
y = Ae** cos (wx —«) where A and ε are arbitrary constants. The 
structure of the equation fixes the period 27/w of oscillation and the 
extent of the damping ( -- «). The arbitrary constants (i.e. the initial 
conditions) fix the amplitude A and the phase (given by «/w). No 
matter what initial conditions are imposed, the oscillation of y given 
by the differential equation has a fixed period and a fixed damping. 


14.5. The use of the operator D. The operator D - for derivative 


and its inverse D-1 for anti-derivative or integral are extended to 
form a group under multiplication (10.8 above) by writing powers 
125 for the nth derivative (n positive) or the (—n)th integral (n 
negative). It is perfectly possible, simply by imposing the appropriate 
definitions, to extend further and to incorporate sums and scalar 
products. The extension is most easily seen by taking it in two 
stages. 


Notation: The polynomial operator D" +a,D"-1+...+@,-1D +, 
8 such that 
(D* +a,D°1 +... +4,-,D +4,)y=Dty +4,D"-ly +... πα, ἀν + Oy 
for any positive integer n and for any function y with derivatives up to 
the nth. 
With this notation, it is found that polynomial operators follow the 
same algebraic processes as algebraic polynomials. For example: 
(1) (D? + D)y = D*y + Dy by the notation. Further: 
D(D+1)y=D (Dy +y) = Dy + Dy =(D? + Dy. 
Hence the operator D?+ D can be factorised D(D +1). Similarly : 
(D —1)(D+1)y=(D - 1) (Dy +y)=D (Dy +y) -- (Dy Ἐν) 
=D*y + Dy — Dy —y =D*y —y =(D* - τὴν 
and so the operator (D—1)(D+1)=D*-1 and again factorisation 
is valid. 
The next notation completes the extension: 
F,(D) 
P,(D)’ 
F,(D) and F.(D) are polynomial operators, is such that: 


Notation: The rational fraction operator F(D)= where 


420 LINEAR SYSTEMS | [14 


_ FAD 
of rip * 


then F,(D)y=F,(D)z. 


« .Φ . Φ 1 Φ . e 
This is, in effect, an extension of the use of — = D-" for anti-derivative. 


D 
For, if νυ τα, then y = Dz, i.e. if fy ἀκ =z then y=S. To illustrate: 
.. D2? +1 | 
(ii) 5 1 = D-\(D¥y +y)=D“Dy + Dy =Dy + Dy 
| =(D+D)y 
a ae asia 
and so D =D+D-', as obtained by dividing through by D. 
Similarly : 
D?~1 ; 
Das =* means (D? ~—1)y=(D + 1)z. 


Hence, (D+ 1)(D—-1)y=(D+1)z 


1.6. (D—1)y=z since D+1 applied to each side gives equivalent 

expressions. 

a 1) -- 
"D+1 


These examples serve to show that manipulations of operators 


S =D —1, as obtained by dividing through by D+1. 


which are polynomials (or ratios of polynomials) in D =. follow all 


the algebraic processes. Such operators are applied to any function 
with the appropriate number of derivatives. They are to be written 
before the function, and never following it, since Dy has meaning 
but yD not. 

In technical language, the new notations achieve the adjunction 
of an outside element D to the field of real numbers (the a’s). The 
result is a rational fraction operator /(D) and all such operators 
form a field, with the appropriate sums and products. Indeed, the 
operators not only form a field, but also a vector space over the field 
of real numbers. They are like complex numbers or rational (algebraic) 
fractions, forming a set which is a field with a scalar product opera- 
tion, i.e. a vector space with the structure of a field. In short, they 
are perfectly well-behaved algebraically. 


δ] LINEAR SYSTEMS 421 


It is useful in practice to have the result of applying a rational 
fraction operator in D to various specific functions. Two simple cases 
are given here, for the exponential and circular functions. Other 
cases can be given, as in 14.9 Ex. 7. 

Exponential functions. From the standard form, De** =)e**, we get: 


F(D)e* = F(A) ,...νννννννννννενννννννννννννν (1) 


Here F(x) is a rational fraction, F (A) is the real value obtained by 
substituting A for x and F (D) is the corresponding operator in D. It 
is assumed in (1) that the denominator in F'(A) is not zero. The result Ὁ 
(1) can be summarised by saying that, in writing any expression 
including derivatives of an exponential function e’*, we can substitute 
D=2. See 14.9 ex. 6. 

Circular functions. From the standard forms, D cos wx = — ὦ sin wa 
and D sin w2=w cos wx, we get: 


D* cos wr = —w* coswx and D2? sin w= —w? sin wx. 
So: F (D?) cos wx = F ( — w?) cos κα (2) 
aa ΠΣ ae ne | 


In (2), the polynomial or rational fraction F (D*) consists only of even 
powers of D. The result can be summarised by saying that, for 
derivatives of a circular function cos wx or sin wx, we can substitute 
D?= —w, 
Some examples illustrate (1) and (2) in practice: 
(11) (D? — 1)e** = (λβ -- 1)e**, which is checked: 
(D? ee 1)e# = [%ert — edz — R2eAz _ Che — (A2 as 1)e**, 


Further. if e=—z, then the 


1 Ax 

ἼΩΝ 
> Dt- — 
notation means that (D? -- G4 =e, But 


(Ds - 1) (a3) = =i] τ(ρ"- Ξ ἰῷ - Leto, 


as required. 


D-1 


Hence, z= 


Ag 
2-1 
(iv) (D? +1) cos wx =(1 -- w?) cos wx, which is obtained directly: 
(D? +1) cos wa =D? cos wx + cos wt 


= — w* C08 wx + COS wx = (1 — w?) COS we. 


422 LINEAR SYSTEMS [14 


l e i . 
> Daz Sin or =— οί to be checked by showing that 
ὦ 


in wz : 
(Ds af: 1) (5 x ) =SIN w® 
— ὦ) 


Again 


and this is so. 3 

The development of the operator D in this way is designed to lead 
to a practical method of solving linear differential equations with 
constant coefficients. Write the homogeneous form of an nth order 
equation of this type: 

F(D)y=0 where F(D)=D"%+a,D"1+4...+4,_,D+dp...... (3) 
Factorise the polynomial F (D) into: 
F(D)=(D —A,)(D Ag) ... ὦ —A,) 

where dj, A,, ... A, are the roots (real or complex) of the auxiliary 
equation F(D)=0. This means no more than that the auxiliary 
equation 2 (λ) =0 has these n roots. Then, if D=),, F(D)=0 and we 
have a particular solution of (3). For D=A, implies that a solution 
y, 1s such that Dy,=A,y,, ie. such that y,=e%*. Similarly for 
λα» Ag, ... An» Hence we obtain the result of 14.3, generalised and in a 
slightly different form: 

THEOREM: The general solution of F (D)y=0 where 

F (D) =D" +a,D"-1+...+a,_,D +a, 

ws: y = Aye + Ayer? +... + Arne 
where D=d,, dg, ... An are the roots of F(D)=0, provided they are all 
different, and where A,, Ag, ... A, are arbitrary constants. 
The case of multiple roots of #'(D)=0 raises the difficulty met and 
solved in 14.3. If D=A is a double root of F(D)=0, then the corre- 
sponding part of the solution y is (A,+A,z)e*. In this way, the 
correct number of arbitrary constants is maintained. The result 
generalises to triple and higher multiple roots (14.9 Ex. 8). 

Next, write the non-homogeneous equation F'(D)y=4(x), where 
Μ᾽ (D) is the same polynomial operator as before. We can also write: 


'= FOF σ- ele 


since this notation simply means that F(D)y=4(x). However, if 
¢(x) is of suitable form, we can apply results such as (1) and (2) 
above to transform (4) into a function of 2, derived from ¢ (x). The 


5, 6] LINEAR SYSTEMS 423 
result is a particular integral of the differential equation F (D)y=¢ (2). 
THEOREM: A particular integral of F(D)y =¢(x) is to be obtained 


asa function of x from: y= aia (x) according to the form of ¢(x). The 


general solution of the differential equation is then the particular integral 
added to the complementary function of the previous theorem. 
The application of this theorem depends entirely on having 


operator results of the kind shown in (1) and (2). The following 
examples illustrate: 


(v) D*y —y=e*. The complementary function is obtained from the 
roots D= + 1 of the auxiliary equation D? -- Ἰ--0; it is 
y =A,e* + A,e. 
The particular integral is given by 


oUt. pages —1 p28 

Y= Day Hy 

The complete solution is: y= A,e* + A,e-* + 1e%, 
(vi) D*y-y=cos wx. The complementary function is that of 


; : 1 
example (v). The particular integral is ψ5- 5 -- 908 wr = = 


COS Ww 
1+?" 


and the complete solution is: y=A,e* + A,e-* — 


14.6. Linear difference equations. A problem may be put in terms of 
the rate of growth of a function y =f (zx) as x increases continuously. 
Alternatively, it may be expressed as the change in the variable y 
over a regular sequence of discrete values of x or over discrete 
intervals of x. If the unit for zx is selected as the fixed interval, we can 
write y, =f(n) for n=0, 1, 2, 3, ... from a starting point y, at n=0. 
A case in point is the familiar problem of compound interest when 
interest is compounded annually at 100a per cent per year. The 
result for the amount £y, of an initial £y, after n years is: 


Yn =y,(1 Ἔ a)". 
This is to be derived from the fact that, in one year after the nth, 
fy, grows to £y,,, by the addition of £ay,, of interest: Ynt1— Yn =AYp. 


Hence, write: 
Ynti~(L+a)y,=0 2.0.0.0... Se ere (1) 


424 LINEAR SYSTEMS [14 
as a, ‘difference equation’ from which to derive y, Ξε ψο( ] +)" in terms 
of an arbitrary constant or initial value y,. This matches the differen- 
tial equation of 14.2 in the corresponding problem of interest com- 
pounded continuously. 

The theory of difference equations, of which (1) is a simple 
example, follows very closely the corresponding theory of differential 
equations. 

DEFINITION: An (ordinary) difference equation 1s a relation between 
successive values of a discrete variable y,, for n=9, 1, 2, ...: 


F (2, Yns Yntr +++ Yntr) =0 | 
where r is the order of the equation. The equation is linear tf tt 18: 
Yer t+ OYntr—1 9. FOr Ynt1 + Un =P (M) 
where the coefficients are constants or dependent on n. 


Following the argument of 14.2, we expect that any difference 
equation of order r has a solution with r arbitrary constants, each 
of which can be eliminated by one process of differencing. Further, a 
linear equation 


ψ 41. FU Ynt να Ἐν. Fb nt1 t$UYn=P(N) .....ὁὁεννονς (2) 
has a corresponding homogeneous form: 
Yate + Yates t ees FOr nty $F UYn =O «00. νννννννννον (3) 


It then follows that, if y, =f,(n) and y, =f,(n) are two solutions of (3), 
80 is ¥,=A,f,(n) +A,f,(n) for any constants A, and A,. Further, if 
Yn =f (n) is a particular solution of (2) and y, =f,(n) a solution of (3), 
then y,=/f,(n)+f(n) is also a solution of (2). Hence: 

THEOREM: The general solution of the linear difference equation (2) vs - 


Yn =Arfi(n) + Aafa(n) +... - 4, {ω{η} -Ὁ [(σ)....ἐννννννννς (4) 
where f,(n), fa(n), ... f,(n) making up the complementary function are r 


different solutions of the homogeneous form (3), where f (n) is the 
particular integral, any solution of the equation (2), and where Ad 
A,, ... A, are arbitrary constants. 


The arbitrary constants can be expressed in terms of r initial 
conditions, usually expressed as the given initial values of y,: 


y, at n=0; y, at n=15... ψ,.. at n=r—-1. 


6] LINEAR SYSTEMS 425 


With these initial values, the difference equation (2) provides, by a 
process of iteration, each succeeding value: y,, Y,41, Yrio) -.. - For 
(2) gives y, in terms of yo, y,, ... ¥,-1 (given), then y,,, in terms of 
Yi, Ya, --- Yr, and so on. This is always possible, but it often fails to 
give y, explicitly as a function of n. An example illustrates: 

(1) Ynt1 — 2Yn =n + 2, given y,=2 at n=0. 

In succession: y, =2y,+2=6; y,=2y,+3=15; Y= 2y,+4=34;... 
Hence, for n=0, 1, 2, 8, ..., y,=2, 6, 15, 34, ... . This still does not 
make it clear what y, is as a function of n. Other methods need to be 
sought for this. 

When the linear difference equation has constant coefficients, a 
general method can be devised for writing the solution (4). The first- 
order equation is: ¥,;;+@y,=¢(n). The particular integral is any 
one solution which can be found. The complementary function is to 
be got from the homogeneous form: y,,,=(-—a)y,. Hence _ 


Yn =(-@)Yn-1=( — B)PYn 9 = 000, 

giving: Yn=—A(-a)" n=0, 1, 2,... 
where A is arbitrary. This is to be compared with the solution 
y=Ae~* for the corresponding differential equation Dy +ay=0. 
The power function (-—a)*, with x integral, appears instead of the 
exponential e~**, The form of the solution depends on the value of 
the constant a. The variable y, increases or decreases in absolute 
value according as | a |>1 or | a |<1; the sign of a then determines 
whether y,, varies steadily or by alternation in sign. To return to the 
example: 

(11) Yn+1—2y,=" +2 has complementary function Yn, —A2", For a 
particular integral, try y,=an +f and attempt to find « and B. So: 


Yuri —2Yn=a(n+1)+B-2(an+B)= -- αἢ - (α --β). 
This is to be equal to n + 2 for all n, i.e. ~«=1 and α ~B=2. Hence, 
a= —1, B= -- 8, The particular integral is y, = -- (ἢ +3). The com- 
plete solution is: 
Yn =A2" —(n +8). 
Given ¥,=Y, at n=0: y,=A -- 3. Hence: 
Yn = (Yo + 3)2” — (n +3). 
If y,=2, then y,=5 . 2" —(n+3) for n=0, 1, 2, ... 
1.6. the sequence y, =2, 6, 15, 34, ... of example (i) above. 


420 LINEAR SYSTEMS [14 


The second-order difference equation Yn41+@Yn+1+ by, =¢(n) has 
the homogeneous form: 


Yntet ayn+1 + by, =0 τὶν δον ο Cis ons eWeseunaew eres (5) 
Apart from a particular integral (to be obtained however we can), 
the solution of the equation reduces to getting the complementary 
function from (5). The solution in the first-order case suggests trying 
Y,=A" for appropriate A. From (δ): 

Ant? + qAn+l + 6A" =0, 
If \=0, then nothing is added to the particular integral. Hence, take 
A40 and cancel A” to get the auxiliary equation: λϑ- αλ- ὃ τε with 
two roots A, and A,. The complementary function is 
Yn -Ξ A,A," Ἔ A,r," Se ee ee ee i (6) 

The form of (6) depends on whether A, and A, are real or complex. 
Case: a®>4b. Real and distinct roots: A,, A,=4{-a+ ./(a? -- 4b)}. 

The solution of (5) is then (6) with real A, and A, as specified. It is 
very similar to the solution of the first-order case, with two power 
function terms to consider instead of one. 

Case: a®=4b. Real and equal roots: A= -- 4a= — Jb. 

The equation (5)is then: Ψ,...2 — 2AYn+1 + A?y, =0. Only one particular 
solution (y, =A") is found so far. To get another, adopt the trick 
which worked in 14.2 and try y, =n": 

Ynt+2— 2AYn+1 sk A7Yn, ΞΞ (n an 2)A"+? — 2A (n a 1)An+t + A*nA" 

={(n+2)-2(n+1)+n}Art+?=0 

i.e. y, =n" is the second solution and the general solution of (5) is: 
Yn =(A,+ NA )A”. 

Again, this is not very different from the solution of a first-order 

equation. The sign and magnitude of A determines the form of y,. 

Case: a? <4b. Conjugate complex roots: d,, A, =4{ —a+%,/(4b -- a*)}. 

Write the roots in the form: ),, A,=7(cos 847 sin 6) -Ξ γε 


where rcos6=-—ta and rsin θ-:}, (4 -- αϑ) 
i.e. where r= (δ and tan@= - (3- ] 
aa) -* 


given in terms of the structural constants a and ὁ of the equation 
(5). The solution of (5) is then expressed: 


6] LINEAR SYSTEMS 427 
Yn=A,A," + ApA." = A ,rretn? + A arme—ind 
=1r"{A (cos Ἠθ +1 sin 8) + A,(cos nO -- ὁ sin n6)} 
=1"(B, cos n6 + B, sin n@) 


where B,=A,+A, and B,=1(A,-—A,) are alternative arbitrary 
constants. A further switch in arbitrary constants to _A and ε is made: 


B,=Acose and B,=A sine 


1.6. 4-- ,((Βὲ +B) and ¢«=tan-(B,/B,). 
So: Y, = Ar"(cos n@ cos ε + sin 6 sin ε) 
1.6. Yn ΞΞ.475 COB (NO -- Ε) ..rccccccrersscccsccsecsess (7) 


This is the general form of the solution of (5) in this case. It is to be 
compared with the solution y= Ae** cos (wx — ε) of the corresponding 
differential equation in 14.4. Again (7) represents an oscillatory 
variation in y, for n=0, 1, 2,... and the variation is of sinusoidal 


—_———$ . 


form. The period is 27/0 where 6 is given by tan 0 = — iad — linterms 
a 

of structural constants. The damping is described by the positive 

constant r=,/b, again given by the structure of the equation. The 

oscillation is damped if γ «1, regular if r=1 and explosive (anti- 

damped) if r>1. The amplitude A and phasing « are arbitrary con- 


stants, given by initial conditions. An example illustrates: 


(iti) Ynt+2—Yntit ΠΑ ΞΞ ({)". 
The complementaty function is obtained from the auxiliary equation 
A?—A+4=0 with roots (141). Hence: 


: ὃ 1 
rcosO@=$ and rsin@=}4 ie. r=— and tan6=1. 


J/2 
Explicitly, θ =F the value of θ in the range 0 to 27 for tan 6=1 (and 
sin 0=cos 6 = =) . The complementary function is: 


καθ) ἢ 


for arbitrary A and ε, to be given by initial conditions. This is a 
damped oscillation with period 27/0, where 6=7/4. The period is 


428 LINEAR SYSTEMS [14 


8 units, 1.6. the cycle is complete in the range from n=0 to n=8. 
To find a particular integral, try y, =k(2)" for some k. Substitute: 


Ynt2—Yn-1 + 2Yn=ak(z)” to equal ($)” for all ἢ. 


Hence k=4. The complete solution of the equation is: 


m= 4(4"+4(—5)" cos (= -«] 
As n increases, the first (steady) term dies out, as does the damped 
oscillation of period 8. 

Linear difference equations with constant coefficients and of order 
higher than the second are solved in the same way. It is necessary 
to find the roots (more than two in number) of an auxiliary equation 
of higher order than a quadratic. The complementary function in 
the solution then contains several terms, some of which may be of 
the form (6) with real A’s and others of the oscillatory form (7). Just 
as the operator D is of use in solving differential equations (14.5), so 
now an operator can be introduced in the solution of difference 
equations. It is the shift operator Εἰ defined so that Hy, =Yn+1, 1.6. # 
is the operation of getting from y, to the next value y,,, in sequence. 
This is examined in 14.9 Ex. 17. 

Simultaneous linear difference equations in several variables can 
be handled on the same lines as for differential equations. A simple 
case is the pair of linear difference equations in two discrete variables 
Yn, and 24: 


Yn+1 = χε, Ὑ Ay 22% and Smt 1 ΞΞ οι, HF Agaen secceececees (8) 


These are such that y, and z, each satisfies the same difference 
equation of the second order. Both variables follow the same type of 
path; for example, both may have an oscillation of the same period 
and damping. See 14.9 Ex. 13. 


14.7. Laplace Transforms. The linear (algebraic) transformation has 
the additive property that the transform of a linear combination is 
the linear combination of the separate transforms (14.1 above). The 
᾿ game property holds for the derivative f’(x) of a function f(x), taking 
this as a transform, by the operator D, of one function into another. 
If f,(z) has derivative f,'(x) and /f,(x) derivative f,'(x), then 
Ai f1(%) +Asfo(x) has derivative A,f,'(x) +Asf.'(x) for any constants A, 


7] LINEAR SYSTEMS 429 


and A,. Other transforms from one function to another have this 
basic additive property; one of them is now considered. It is the 
Laplace Transform named after Laplace (1749-1 827) and it has many 
practical uses. 


Derinition: The Laplace Transform of the function y(x) defined 
for x>0 is: 


00 k 
ὅ (p) =| e~ Py (x) da = Lim| e-o*y (x) dx 


ζτ- 5 
defined for those real values of p for which the infinite integral exists. 


The choice of notation for the new function of p, derived as the 
Laplace Transform of a given function y (x), is not an easy one to 
make.* The notation, adopted here, is in quite general use and it does 
serve to stress, by putting a ‘bar’ over y, that y is transformed into a 
new function 7. 

Given a function y (2), or a class of functions, it is necessary first to 
determine whether the infinite integral for 7(p) exists. It may exist 
for some functions and not for others, or for some values of p and not 
for others. Note that e-?*->0 as x0 80 rapidly when Ὁ is positive 
(12.3 above) that it usually outweighs any tendency for y (x) to 
increase with x. Hence, the Laplace Transform is to be expected to 
exist for most ordinary functions if p is positive (and sufficiently 
large). The transform fails only when p is small (or negative) and/or 
for such unusual functions as y(x)=e**. Some examples illustrate 
how 7(}) is to be obtained from the definition: 


i ¢) 1 οὉ 1 

(i) If y(x)=1 (constant), then 7(p) -| 6-Ῥρὼ da = [-5e"| os 

0 » ο Pp 

since e-?*->0 as xo (p>0). So y(z)=1 has Laplace Transform 


] 
i (p)=— (p>0). 
9(})--- (p>0) 
(ii) If y(v) =a, then 7.(}) =| xe~?* dx. Integrate by parts: 
0 


[em dx τα» dx -- | (Dele dex) dx = ~ seem vole da 


* See J. C. Jaeger: An Introduction to the Laplace Transformation (Methuen, 1949), 
p- vi. 


480 LINEAR SYSTEMS [14 
: 1 1 
1.6. |e» da= ——e~?* (x + | : 
Pp Pp 
So: | cere dx =F since e~?*—>0 and xe-?*-+0 as x 0 (p>0). 
0 


The Laplace Transform of y (x) =z is 7(p)= τ (p>0). 


(iii) If y (x) =a", then 7 (p) -| are dx =I, (say). Now: 
0 


jere™ dx -οἦ dx -- | (D") (|e 40) ἄχ ΞΞ 


- 1 y"e-v2 + eae dx. 
p p 


Since x"e-?*—>0 as x—> οὐ, insertion of the bounds of integration gives: 


nN 
Ls ae (p>0). 


nn-I nn-l1n-2 n! n! 
So: I,=- —,.=- — —1[,-3=...=—1,=—. 
pp "* p p po” i 
Since [ =["e e—Pt dx = as in (i) above. Hence, the Laplace Transform 
ste as n! 
of y (x) =2" is 7 (}) ΕἼΣ. (p>0). 


The basic additive property of Laplace Transforms follows from 
the definition: 


ΤΉΒΟΒΕΜ: If y,(x) and y,(x) have Laplace Transforms 4,(p) and 
Gp), then Ayy,(x)+Agy(x) has Laplace Transform Aj(p) + AGP) 
for any constants i, and dg. 


The proof is immediate, depending only on the additive property of 
integrals. 

Another consequence of the definition is that, if 7(p) exists for a 
given y(x), then it is unique.* The practical problem is to find it, to 
write it down as quickly as possible. This is precisely the problem of 
the handling of derivatives in practice. First, it is to be checked that 
a given function y(x) has a Laplace Transform, a similar problem 


* The converse is also true though not proved here. Given a function y(p), then a 
unique ¥ (x) exists with y(p) as its Laplace Transform. 


7] LINEAR SYSTEMS 431 


(involving limits) to that of determining whether a derivative exists. 
Second, a set of standard forms for the Laplace Transforms of the 
simplest functions is to be obtained from the definition. Third, 
certain operational rules are established, again from the definition, 
with the object of facilitating the writing of Laplace Transforms of 
more complicated functions. 

The standard forms include the following: 


y(x) Laplace Transform ἢ (}) y(x) Laplace Transform 7(p) 


n! o 


κ ΟΝ : 
x (p>0) sin ax pra (p>0) 


est ------ (p>a) COS ax 


p-a pirat (P>9) 


Here a and « are constants. The proof of the first of these forms is 
given in example (iii) above. As particular cases (n =0, 1), note that 
1 has Laplace Transform 1/p and that x has Laplace Transform 1/p?. 
The other standard forms follow from the relevant integrals, see 
14.9 Exs. 19 and 20. 7 
The first of the operational rules is the basic additive property. Ex- 
tended to several terms, the rule is that A,y,(x) +Agy_(x) +... +AnYn(2) 
has Laplace Transform λιζ,(}) -- λαῦ(}) +... +AgJn(p), Where A, is 
any constant and 9,(p) is the Laplace Transform of y,(x) for r=1, 2, 
... n. For example, the Laplace Transform of (1 +)?=1+ 22 +2? is 
1,2 
P Pp 
is that, if y(x) has Laplace Transform 7(p) and if a is any constant, 
then e**y(x) has Laplace Transform 7(p —a). This serves to extend 


+ =. Another operational rule easily established (14.9 ἔχ. 21) 
»}» 


the standard forms to give as the Laplace Transform of 


n! 
( p- q)rri 
a ᾿ p-a 
ew, —_—__. as that of e sin ex and ——_~———. ag that of 
(p —a)? + α (p — α) +a? 
655 cos ax (all for p>a). 

The Laplace Transform is designed to handle functions of a 
᾿ continuous variable, and particularly exponential and sinusoidal 
functions of the kind met in solving differential equations. From the 
standard forms above, it is evident that the transform serves to 
replace such functions by algebraic expressions (ratios of polynomials) 


gn 


432 LINEAR SYSTEMS [14 


in p. The Laplace Transform, in one of its main applications, reduces 
a differential equation to an algebraic equation; it is often the best 
way of solving differential equations when initial conditions are 
specified. There is a corresponding transform designed to handle a 
discrete variable, and to solve difference equations. It is the Generating 
Function which transforms a given sequence y, (n=0, 1, 2, ...) into 
a function of s: 


9(8)= Σ᾽ ns”. 
n=0 


As compared with the Laplace Transform, this has the power 
function 85 instead of the exponential e-?* and an infinite series 
instead of an infinite integral. It is also a transform with the basic 
additive property: if the sequence y, has transform ἢ (8) and if the 
sequence z, has transform Z(s), then the sequence (λιν, +Aszn) has 
transform 4,7 (8) +A,2(s) for any constants A, and Ay. 


14.8. Linear models. In many problems the variables are related 
among themselves by means of inter-dependent equations, usually 
differential or difference equations. A frequent case is that in which 
the variables are functions of time such that one variable is de- 
pendent on the sequence of past values of another variable. This is 
the case of a lagged dependence. Suppose that there are two variables 
in a system, ψ ({) and z(t) as functions of time ¢. Suppose further that 
y(t) depends on all past values of z(t), the dependence being linear in 
that the influences of each past time sum in their effect on y (t) after 
multiplication by an appropriate factor. These factors, for various 
past times, make up a weighting function w(r), where 7 is the time 
interval taken in the past. Hence, the current value y (t) is the sum 
of the past values z(¢—7), each weighted with w(r), for all +>0. 
In limiting or integral form: | 


v)=| W(T)z(E—T) AT ......«νννν ον νιν ννννν νον (1) 
0 | 
Similarly, 2 (6) may have a lagged dependence on y(t): 
z(t) =| W'(T)Y(E-T) ἄτ cicceccccccccccccscecces (2) 
0 


where w’(r) is another weighting function. The equations (1) and (2) 
together make up a linear and inter-dependent system in two 


8] LINEAR SYSTEMS 433 


variables. It is required to solve the equations to give the time-paths 
of y(t) and z(é). 

Such problems arise in many fields, e.g. to describe the behaviour 
of physical equipment, as in control systems in engineering, or of 
organisms in ecological problems. Equally, they may express the 
behaviour of units such as groups of firms or consumers in economics. 
T'wo examples illustrate. | 

(i) The speed of a steam turbine is regulated by a governor which 
varies the amount of the steam valve opening. Because of inertia in 
the turbine, the speed y(t) depends on the valve Opening z(¢—7) at 
earlier times (7>0). If the dependence is linear, then it appears as 
(1) for some weighting function w(r) determined by the properties 
of the turbine. Further, the governor of the turbine is designed to 
change the valve opening according to the discrepancy between the 
turbine’s speed and some desired speed, again with lags in the 
operation of the governor. The valve opening z(é) depends on the 
turbine’s speed y(¢-—7) at earlier times (r>0). Then (2) expresses 
such a dependence, taken as linear, where the weighting function 
w'(r) is given by the design of the governor. The problem is to solve 
(1) and (2) to get the time-path y(t), the speed of the turbine, and to 
vary the design of the system to eliminate or minimise fluctuations 
in y (#). 

(ii) Consider an economic problem of production and consumption 
in the whole economy, ignoring (for simplicity here) the matter of 
investment. Let y(t) be the supply and z(t) the demand for goods and. 
services, both in money values. Because of lags in the production 
system, supply y(t) depends on demand z(ét—7) at earlier times 
(7>0), as shown by (1) in a linear model. Further, the money value 
y(t) is also the aggregate income in the economy, out of which the 
demand z(t) arises. Because of lags in the distribution and disposal 
of incomes, demand z(t) depends on income y (¢—7) at earlier times 
(7 >0), given by (2) in a linear model. The problem is to solve (1) and 
(2) for the time-path of y(t), production and income in the economy, 
and to stabilise the economy by modifying the lag (weighting) 
functions w’ (τ) and w(7) or otherwise. | 

A system which includes pairs of relations such as (1) and (2) is 
characterised by a feed-back; one variable is influenced by another 
and, in its turn, feeds back to regulate the other. In case (i), the 


484 LINEAR SYSTEMS [14 


speed of the turbine depends on the valve opening but also feeds back 
to regulate the valve opening. Similarly, in (ii), the simple mutual 
relations between supply and demand (production and consumption) 
constitute a feed-back system. Now consider a particular case, that 
of an exponential lag, or weighting function, in the dependence of 
one variable on another. This is not at all unrealistic: 

(11) Take w (ἐ) =Ae* (A constant) in (1). The constant A is designed 
to make the sum (integral) of the weights equal to unity: 


|. (t) dt =r jem dt= [ ΒΞ aa een 
0 0 0 


Substitute in (1) and change the variable of integration by writing 
ᾳ τοί -- τ (dyv= -dr): 


t 


y(t) τε λ [ewe (ἐ -- τ) dr=—A | “erAlt—8)2 (x) dx 
0 
t 
=e | er&z (x) dx 


t 

i.e. . erty =| ertz (ar) dx. 

Write and equate the derivatives of the two sides: 
ety 0) ἘΣ Dy (t) =e () 


1.6. Dy (t) + dy (t) =Az (ἢ). 


This is a differential equation, first-order and linear, to which the 
given relation (1) reduces when the weighting function w(t) is ex- 
ponential. In operator form, the differential equation is (D +A)y=2z, 


_ A 
ψ —“D+r 2. 

This example suggests, what is in fact usually true, that a relation 
of the linear form (1) reduces to a differential equation in y(¢) and 
z(¢), and that the equation is linear with constant coefficients. This is 
the point to be pursued. Suppose that the variable z(¢) shows ex- 
ponential growth and/or regular oscillations, so that it is of sinusoidal 
form as given in 14.4 above: 


z(t) = Aer Cos (wt -᾿ Ε) ....«ονννννννννννννννννονον (3) 


1.6. 


8] LINEAR SYSTEMS 435 


Generally, (3) is an oscillation of period =, damped or explosive 


according to the sign of «; A and ε fix the amplitude and phasing. 
One particular case («=0, w40) gives a regular oscillation of un- 
changed amplitude. Another («40, w=0) gives a steady exponential 
growth or decline at the rate a: z=z,e* (z)=A cos €). 

Instead of (3) in real terms, it is convenient to write the complex 
variable: 

Z (t) = Aete?* for ρ-εα- ἴω oocccccccccecccccecceces (4) 
Then: Z (ἢ) = Ae*tet<eivt 
= Ae*(cos ε +14 sin ε) (008 ωὖ - ὁ sin wt) by (4) of 12.6 
= Ae**{cos (wt +¢) +i sin (wt+e)} by (2) of 12.5. 
Hence, the real part of (4) is the sinusoidal variable (3), and this is 
what we concentrate upon. The imaginary part of (4), Ae* sin (wt +), 
is similar but usually to be ignored. We can now operate upon (4) 
according to the ordinary algebraic processes. First, if ¢ is subject to 
a fixed delay 0: 
Z(t — 8) = Ae*e? ¢-*) — e— 79 Aeteent 
1.6. Z(t—0)=e-PZ (t). ee eeeccccsscecceeesscseces (5) 
Second, using the derivative operator D: 
DZ (ἢ = Ae*De? = Ae*( pet) = pZ (ἢ 
D*Z (t)= DpZ (t)=pDZ (ἢ = p*Z (ἢ). 

Generally : DZ (i)=p"Z (ἢ) 


The validity of this procedure can be checked (14.9 Exs. 25 and 26) 
by showing, not only that z(é) is the real part of Z (ἡ), but that 
z(t — 6) is the real part of Z(¢ -- θ) in (5) and that 2™(¢) is the real part 
of D" Z(t) in (6). Hence: 
| Tuxorem: The sinusoidal variable z(t) = Ae* cos (wt - ε) is the real 
part of Z(t)=Aee”* (p=a+iw), τ( -- θ) is the real part of 

Z (t — 0) =e-”°Z (ἢ) 
and z(t) ts the real part of DZ (t)=p"Z (t). 


In terms of operators, the delay θ in ¢ in z(t) corresponds to multiply- 
ing Z(t) by e-”, and the nth derivative of z(t) is obtained by writing 
D=p in D"Z(E). 


P A.B.M. 


486 LINEAR SYSTEMS (14 


These results are to be applied to the lagged dependence (1) of 
y(t) on z(t). First, as a matter of notation or definition: 


Derinition: The transfer function of the lagged dependence (1) is 
the Laplace Transform of the weighting function : ὦ (}) =| e—Ptw (t) dt. 
0 


The Laplace Transform is defined in 14.7 for a real variable p but the 
definition extends (14.9 Ex. 22) to the case where p is a complex 
variable. Here the transfer function @(p) is taken as a function of a 
complex variable p, to be identified as p=«-+iw, for the sinusoidal 
variation considered. 

In (1), let z(t) be a sinusoidal variable, the real part of Z (t) = Ae*e?* 
(p=a+tw) and let the corresponding y(t) be the real part of Y (ἢ). 
Then: 


Y (ἢ =|" (r)Z (ἐ -- τ) ἅττ ὦ (τ) 6-τΖ (t)dr by (5) 


=Z ()|"e-mw (τ) dr 


i.e. Υ͂ (ἢ) κε (»)Ζ (δ) ....«.νννννννννονννννννν νον ..-(7) 
Since D=p=a-+tw for sinusoidal variables, this gives: 

Y (ἢ =a (D)Z (t) 
or taking the real parts: 


YL) WD) σσ-ν (8) 


This is a differential equation connecting the sinusoidal variables 
y(t) and z(é) related by (1). The original relation is reduced to a 
differential equation, as desired. Moreover, it is seen, from the 
standard forms for Laplace Transforms (14.7), that all the usual 
weighting functions w(t) have Laplace Transforms which are ratios 
of polynomials in p. Hence, if it happens (as expected) that @(p) is 
the ratio F'(p) : G(p) of two polynomials, then (8) is: 


G(D)y (=F (Dz) 


and the differential equation is linear with constant coefficients. To 
illustrate: 


(iv) If w(t)=Ae-*, then the transfer function is @(p) = ae 


8] LINEAR SYSTEMS 437 


given by the standard form of 14.7. Hence, by (8), the differential 
equation form of (1) is: 


WO=55 20 


for this particular weighting function. This agrees with the direct 
method of example (iii). | | 

To return to (7), we notice that the variation assumed for 2 (é), the 
real part of Z(t), gives rise to a corresponding variation in y(t) as the 
real part of Y (ἐ). It is taken that z(t) is a sinusoidal variable with 
given period 27/w and with given damping indicated by «. Hence 
p=a+tw is a given complex number, and τ (}) is also a given com- 
plex number. Write w(p)=pe**, where p and ¢ are the polar co- 
ordinates of the complex number. Then by (7): 


Y (ἢ) = ρο Ὁ Ae*e?* = (pA Jet +#ert, 


It follows that y(t), the real part of Y (£), is also a sinusoidal variable, 
that it has the same period and damping as z(f), given by p=«+1, 
and that only the amplitude and phasing are changed. Hence: 


THEOREM: A variable y(t) depends on z(t) by the relation (1), re- 
ducing to the differential equation (8). If z(t) is a given sinusoidal 
variable with period πίω and damping «, then y(t) 18 a sinusoidal 
variable with the same period and damping. 


The change in the amplitude and phasing depends on the transfer 
function #(p)=pe*. The amplitude A of z(é) is multiplied by p to 
give the amplitude pA of y(t). There is a shift of phase by amount 
φ (ε becoming ε +¢) in passing from z(t) to y(t). 

To complete the linear model, take (1) and (2) together, reducing 
to a pair of differential equations of form (8): 


y(t)=@(D)z(t) and 2(t)=w'(D)y(t). 


The common oscillatory movements of y(t) and z(é), 1.6. the common 
period and damping, are to be found by solving simultaneous dif- 
ferential equations. The respective amplitudes and phasing depend 
on initial conditions. This is a generalisation of a particular case 
already considered. If the weighting functions are simple exponentials, 
as in examples (iii) and (iv) above, then the differential equations 
are both first-order. By (6) and (7) of 14.3, both y(é) and z(¢) satisfy 


438 | LINEAR SYSTEMS [14 


the same second-order differential equation, i.e. they both have the 
same time-path as regards period of oscillation and damping. 

The characteristics of a linear model for variations over time can 
now be seen. One variable is given as a linear combination of others. 
Typically, a variable y(t) is a lmear combination of past values of 
another variable z(é), expressed in limiting form by the integral 
shown in (1). The relation reduces to a differential equation, using 
the transfer function of the relation as in (8). This is typically linear 
with constant coefficients. It implies that any sinusoidal variation 
in z(é), or any exponential growth, is reflected in a similar variation 
in y(t). If there are two such relations in two variables, then the 
corresponding pair of differential equations are to be solved. When 
linear with constant coefficients, the solution gives the period and 
damping of the common variation in the two variables, i.e. the free 
or self-sustaining oscillation inherent in the system. Here ‘oscillation’ 
may. take the particular form of an exponential growth or decline, 
the case «40, w=0. In general, it is an oscillation of a determined 
period and with a determined amount of damping.* 

Situations of this type are not uncommon. Certainly linear models 
are not as restricted in scope as might be thought. Nevertheless 
there are serious limitations to a linear model. Differential equations 
which are linear with constant coefficients have solutions confined 
to sinusoidal variables (including exponential growth as a particular 
case). Hence all the variables of the model have one and the same 
variation, alike in period and damping. They only differ by ampli- 
tude and phase, and then because of the ‘accident’ of initial con- 
ditions. There is no possibility of different oscillations for different 
variables — or even of an oscillation of other than the symmetric 
‘cosine’ form. There is no structural explanation of the amplitude 
or phasing of the oscillations. To break out of these limitations, we 
need to formulate the model in non-linear terms, with relations more 
complicated than (1) and with differential equations which are not 
linear with constant coefficients. _ 


* The extension to a system of several equations in several variables is clear enough. 
There is also an alternative formulation, in discrete rather than continuous variables. 
The relation (1) is then a sum, and (8) becomes a difference equation. The solution of 
a system of difference equations is very similar to that of a system of differential 
equations. In particular, it is still limited to ‘cosine’ oscillations, though based on the 
power function instead of the exponential function. 


9] LINEAR SYSTEMS 439 


14.9. Exercises 


1. By writing the auxiliary equation, show that the solution of D’y +aDy =0 
is y=A+Be-**, where A and B are arbitrary. More generally, show that 
D*y +aDy =¢(x) has the same solution as the first-order equation 

Dy + ay =4 (=), 
where (x) =J d(x) dw+A. 

2. If « is a constant, show that a particular integral of δὴν + aDy + by =« is 
ἢ (α) =a/b. Generalise for any differential equation with constant coefficients. 

3. Write the auxiliary equation of δύ +aD*y+bDy+cy=0 and exclude 
the possibility of multiple roots. Show that the differential equation may have 
a solution which is the sum of three steady exponential terms of form e*. 
Otherwise, show that it must have one such term and a sinusoidal term. 
Illustrate by showing that D*y — D*y + 2y =0 has solution 

ψ =Ae~* + Be* cos (% -- ε), 
A, B and « being arbitrary. 

*4. Legendre polynomials. Consider the differential equation, linear with 

non-constant anes : 


D*y - a = ---- Dy nin = 2 y=0 (na positive integer). 
Show that y =z is a solution when n =1, y =4(32? — 1) when ἢ =2 and 
y = 40 (5x? — 3) 


when n =3. Generally, show that y =P,,(x) is a solution, where 


1 
P,(x) = gry! D" (a? i= 1)”, 


a polynomial of degree n. These are Legendre polynomials, named after 
Legendre (1752-1833). 

#5. Bessel functions. A new function, of a type appearing frequently in 
mathematical physics and named after Bessel (1784-1846), is defined by the 
linear differential equation with non-constant coefficients : 


1 *) = 
Dy +2Dy+(1 arr y =90. 
Show that 


y=e-2 (0) ες.) ---+(-1) ate)" + 


is absolutely convergent and satisfies the differential equation (x τ 0). (Assume 
that the series can be differentiated term by term.) This is the expansion of the 
Bessel function, a case of the hypergeometric series (12.9 Ex. 31). 
6. Write the successive derivatives De’*, D%e**, ... and check that 
F(D)e* =F (A)e** 
where F'(D) is a polynomial. Similarly check that 
F(D) (ye) =e**F (D +A)y, 
where y is a given function of x. 


440 LINEAR SYSTEMS [14 
7. If y is a polynomial in z, illustrate how F (D)y can be obtained by showing 


that (1 — D*) (ax* + ba +c) =ax? + ba + (ὁ — 2a) 

and j lac" + bx +c) =ax* + bx +(c + 2a). 

(In the second case, apply 1 — D? to the right-hand side.) Also check that 
1 


Tp — D*)1=14+D?+D4+... 


can be applied to αὐ + bz +c with the same result. Solve D?y -- ν +2=0 by the 
operator method and check the solution of example (iii) of 14.3. 

8. F'(D)y =0 is a linear differential equation with constant coefficients and 
F(D)=0 has a triple root D=. Show that y=(A,+A.r+A,z*)e*” is a 
solution. Generalise to the case of any multiple root. 


τ 
9. Resonance. Show that D*y -- αὖν =e” has solution y = 4.655 + Be-@* + i —, 
: COs αὶ 
and that D*y+a*y=cos x has solution y=A cos (ax -- ε) -- lo Why does 


this fail when a = + 1? This is the phenomenon of resonance. 

10. Simultaneous differential equations. In general, a pair of linear first-order 
equations 15: αὐ + «Dz + Bury + βιοῦ τε and ag,Dy + aegDz + Bory + Boaz =0 
If the determinant of the «’s is not zero, show that this reduces to form (6) of 
14.3, and express the a’s of this form in terms of the «’s and β᾽8. 

11. Show that (6) of 14.3 have an oscillatory solution if A >4(a,, +@.)*, 
that y and z, while differing in amplitude and phase, have the same period 2πίω, 
where w?=A —3(a,; +@2_)?, and that both are similarly damped if a,, --αςς «0. 

12. Solve (6) of 14.3 by an alternative method: write z=ky and show that 
Dy =dy, with solution y =e”, z =ke**, where λ and k are given by: 

1 
Ay; + kay, ΞΞ ας, ἘΠ Qe, Ξξλ, 
Show that there are two values of each of ὰ and k and complete the solution. 

13. Simultaneous difference equations. In (8) of 14.6, eliminate z, and show 


that: Yn+e — (G11 Ῥ ας), + Ay, Ξ 0 
where A =| Gy; αι |. 
Qe, Ase 


Show that z,, satisfies the same second-order equation. Examine the nature of 
the solution. 

14, Show that y,4.+4@y,4, + by, =¢(n) in terms of y, and two subsequent 
values is equivalent to y, +ay,_,+b6y,-.=y(n) in terms of y, and two pre- 
ceding values, provided that ψ (91) Ξε ᾧ (γι — 2). 

15. Show that ¥n42+Yn41 +3Y¥n =0 has a solution similar to that of example 
(iii) of 14.6 but with the shorter period 3. Show that y,42 —Yni1+Yn_ =0 has a 
solution which is a regular oscillation of period 6. 

16. Consider y, ,, —a@*y, =0 given y at n=0, y, at n =1. Show that this can 
be regarded as two independent first-order equations giving y, =y,a" (n =0 
2,4,...) and y, =y,a"—1 (n=1, 3, 5, ...). Solve the second-order equation and 
reconsider the results, ; 


9] LINEAR SYSTEMS 441 


*17. The shift operator Εἰ. Express a linear difference equation with constant 
coefficients as Κ΄ (Εν, =¢(n), where 1 (1) is a polynomial in the shift operator 
E (defined Ey, =y,11). Hence get the complementary function from the 
auxiliary equation F(H)=0 and express the particular integral as 


l 
Yn Ξχ 00). 


Ἐ18. Write HA", H2\", H®)", ... in terms of A" and show that ἢ ([Π)λ" =F'(A)A". 
Hence show that the particular integral of ¥,.4. — Yn+1+4Yn =(4)" is J, =4(4)". 
See example (iii) of 14.6. 

19. From the definition, show that the Laplace Transform of e%” is 


1 
p-a (p >a). 
20. Integrate by parts to show: 
[em sin at dx = lve COS αὖ a [om COB at dx 
a a 
1 , AS 
and [erm 608 ax dx =—e-P* gin αἱ +P [em sin ax dx. 
a a 


Hence obtain the Laplace Transforms of sin «x and cos αἴ. 
21. Show that e*y(x) has Laplace Transform [ξεν (x) dx =¥(q) where 


q=p-a. 
*22. Laplace Transform for complex p. Show that the Laplace Transform 


Y¥ (p) of y (x) can be defined for complex p, having real part i e~* cos wx y(x) dx 


οο 
and imaginary part a. 6 “ὦ sin wr y(x) dx, where p =a+ tw. 
*23. Apply the result of Ex. 21 to show that, if y,(x)=e%* cos bx and 
b ᾿ 
(p -- αἡ +6" Write 
ατεα- δ and show that the Laplace Transform of the complex variable 
e** — e%* (cos ba +2 sin δα) is: 
(p-a)+b 1 
(Ὁ -αἡ᾿ εὖ p-a 


ψε(α) τε σης sin be, then Κ,(Ρ) τες οἶον τς, and ἤ,(ρ) - 


i.e. that the standard form for e** holds whether « is real or complex. 
οΌ 
*24. Fourver Transform. Write Y (p) -(" οἵα (x) da, the Fourier Transform 


of y(x), named after Fourier (1758-1830). Show that Y(p) is a particular case 
of a Laplace Transform: Y(p)=4¥( -- ὥ). If y(x) =sin «x and Ὁ is real, show 


that the Laplace Transform ¥(p) = =H and Fourier Transform Y (p) = 


P 
are both real. 
25. Z(t) τε 4 οἷερρί (p=a+itw) has real part z(t) =Ae*! cos (wi+e) and 
imaginary part Ae* sin (wt +e). Show that e—?? (θ a real constant) has real 


oc? - »5 


442 LINEAR SYSTEMS [14 


part e~*° cos w@ and imaginary part 6 48 sin w6. Deduce that 
z(t — 0) =Ae—*%e%t cos (wt +e — ωθ) 
is the real part of e—?°Z (t). 
26. With z(t) and Ζ (ἐ) as in Ex. 25, show that z’(t) =pAe*+ cos (wt Ἐε +n); 


where ρ Ξε ν αὐ + w* and tan 7 = w/a, directly from z(t). Then express Z(t) in its 


real and imaginary parts and write Ὁ =« - ἕω =pe*” =p(cos ἡ +7 sin 7) to show 
that z’(¢) is the real part of pZ (¢). Generalise by showing that the nth derivative 
z(")(t) = p"Ae*t cos (wt +e«+nn) and that it is the real part of p"Z(t), with p” 
written p"e*™” =p" (cos nn +7 sin nn). 


CHAPTER 15 


SOME FORMAL DEVELOPMENT 


‘God created the natural numbers ; 
everything else is man’s handiwork.’ 
Kronecker (1823-91). 


15.1. From integers to real numbers. (Reference: Chapter 2.) An 
axiomatic development of the real number system is given here. It 
includes a brief account of why each particular formulation is ex- 
pressed in the way chosen. The exposition is carried as far as state- 
ments of the main basic properties which follow from the definitions 
adopted, but the properties are usually not formally established 
since the proofs are excessively tedious. 

(i) Natural Numbers. The development here is based on the system 
of axioms devised by Peano (1858-1932). A ‘natural number’ is taken 
as the primitive (undefined) concept and the idea of one natural num- 
ber as the ‘successor’ of another is a primitive (undefined) relation. 
The object of the exericse is to get a complete set of natural numbers 
and to arrange them in sequence 1, 2, 3, ...,2+1,... . Each natural 
number is the ‘successor’ of the one immediately before it in the 
sequence. In general, n +1 is the successor of n. This is, however, 
hurrying too quickly. Writing ‘n +1’ as the successor of ‘n’ begs the 
question of what we mean by ‘adding 1’. It is better to leave this 
question over, to be answered at the proper time, and to adopt a 
neutral symbol ‘n+’ for the successor of 7. 

The following five axioms are laid down:* 


(1) Successor: each natural number n has a unique successor nt. 
(2) Existence of Natural Numbers: 1 is a natural number. 


᾿ς * Peano’s axioms in themselves characterise any progression starting with 1, e.g. . 

1, 3, 5, 7, . where n* is ‘next odd integer after n’ 

1,1. ere where n+ is ‘half 7’. 
The particular sequence of natural numbers (1, 2, 3, 4, ...) follows by the specification 
of n+ as n +1 and by writing 1+1=2,2+1=8,.... 


P2 A.B.M. 


444 SOME FORMAL DEVELOPMENT [15 


(3) First Natural Number: no n exists so that n*+=1. 

(4) Uniqueness: if n+ =m+, then n=m. 

(5) Completeness: if a set of natural numbers has the properties: 
(i) it contains 1, and (ii) it contains n+ when it contains n, then 
the set comprises all natural numbers. 

Axioms (1), (2) and (3) between them ensure that there is a sequence 
of natural numbers starting with 1. Though the successor of a number 
in the sequence is unique by axiom (1), it is possible that two different 
natural numbers exist with the same successor. This is ruled out by 
axiom (4). Further, it is still possible that there exist natural num- 
bers outside the sequence. This is ruled out by axiom (5). Hence, by 
the axioms, the whole set of natural numbers can be written uniquely 
in sequence. Successive members of the sequence can be given 
appropriate labels: 1, 2, 3, ... on the established (decimal) notation. 
Here 2 is a symbol introduced to stand for 1+, 3 is a symbol for 2+, 
and 80 on. 

Axiom (5) which guarantees the completeness of the set (and 
sequence) of natural numbers is of particular importance. It expresses 
what is known as the principle of mathematical induction and it pro- 
vides one of the most powerful methods of proof known to mathe- 
maticians. 

Given the axioms, the essential definitions of addition, multiplica- 
tion and order are: 

Sums: to each pair of natural numbers m and n associate a unique 
natural number, written m+n, such that (for any m and n): 


(i)n+1l=nt (successor of n) 
and (ii) m+(n+1)=(m+n)+ (successor of m+). 


Here (i) identifies »+1 as following immediately after n in the 
sequence of natural numbers, 1.6. 2=1+ 1 since 2 follows 1;3=2+1 
since 3 follows 2; and so on. Then (ii) serves to reverse the process, 
enabling several steps to be taken in any order. For example, since 
2 is 1+1, 3 is the successor of 1 +1 which by (ii) is 1+(1+1)=1+2. 
Hence, 2+1=1+2 and equally (1+1)+1=1+(1+1), all being ὃ. 
Generally, m+n=n+m and (m+n)+p=m+ (n+p) for any m, n 
and p. These are the commutative and associative rules. 

Products : to each pair of natural numbers m and 7 associate ἃ unique 
natural number, written m xn, such that (for any m and 7): 


1] SOME FORMAL DEVELOPMENT 445 
(i)mxl=n and (ii) mx(n+l)=mxn+m. 
Products are identified as repeated sums: 


mxl=n; nx2=nx(1+l)=nxlin=nin; 
nmx3=nx(2+]l)=nx2+n=n+n+4+N; 


and so on. It then follows, by bringing in the rules already established 
for sums, that similar commutative and associative rules hold for 
products: mn=nm and (mn)p=m (np). (The sign x can always be 
dropped when there is no ambiguity.) There is, further, a relation 
between sums and products. For example: m(n+1)=mn+m by 
(ii); another application of (ii) gives - 


m(n+2)=m(n+14+1)=m(n+1l)+m=mn+im+m=mn4+m x2; 


and so on. In general, m(n+p)=mn-+mp for any m, n and p, the 
distributive rule. 


Order : if m and n are such that m =n - Ὁ for some p, then n is said to 
be less than m, written n<m. In particular, since n+t=n+1, this 
definition of order implies that n<n+. A natural number 7 is less 
than its successor n + 1. The basic order is 1, 2, 3, Δ τορος 

It follows almost immediately from these definitions that the set 
J+ of the natural numbers satisfies all the operational rules and order 
properties of 2.2, except that zero, negatives and reciprocals are 
lacking and that the order is not dense and not extended in both 
directions. 

(ii) Integers. From the set J+ of natural numbers (positive integers) 
a wider set J of all integers (positive, negative, zero) is defined.t The 
idea is that, given two natural numbers n and m, they define a 
positive integer by the ‘difference’ m -- ἡ if n <m, they define zero by 
the ‘difference’ m —-n=0 if n=™m, and they define a negative integer 
by the ‘difference’ m —n = -- (n —m) if n>m. However, the question- 
begging word ‘difference’ and notation m -- ἢ must be avoided in the 
definition. This is done by writing the pair (m, n) of natural numbers 

+ Historically, this was not the first step. It was easier to go from the natural 
numbers to ratios p/qg of natural numbers, i.e. to positive rationals. The concept of 


negative numbers came later and it was not readily accepted. One difficulty was in the 
rule of signs, still puzzling to children: 

(+1) x(4+1)= ΕἸ; (-1) x(-)= ΤΙ; (41) x(-1I=-135(-1)x(+)=-1. 
The definition given here exhibits the rule of signs in the form: 
(1, 0) x (J, 0) =(1, 0); (0, 1) x (0, 1) =(1, 0); (1, 0) x (0, 1) =(0, 1); (0, 1) x (1, 0) =(0, 1). 


446 SOME FORMAL DEVELOPMENT [15 


instead of m—n. Further, a difficulty must be overcome: there are 
many pairs with the same ‘difference’. For example, to get the 
negative integer —2, we have 1-~3=2-—4=3-5=...=-—2. The 
safe way of writing - 2 is as the number pair (m, n) with m+2=n. 
So: | 

DEFINITION: An integer 1s the pair of natural numbers (m,n) such 

that +, x and < are defined for (m,n) and (p, 4): 
(m,n) +(p, T)=(m+p,n+q) 
(m, n) x (p, q) =(mp + nq, mq + np) 
(m, n)<(p, 4) of m+q<n+>p. 
Then (m,n) is the positive integer m—n if n<m, the integer zero if 
n=m, and the negative integer ~—(n—m) if n>m. 
The definitions of +, x and < appear artificial. They are designed 
to carry these operations over from natural numbers to integers. 
For example: 
(m, n)= -- 2 τὸ m+2=n;: (p, g)=3 if p=q+3. 
So: (-- 2) x 3=(m, n) x (p, 4) =(mp + ng, mg + np) 
= {m (q+ 8) +(m + 2)4, mg + (m+ 2)(q+3)} 
= (2mq + 3m + 2q, 2mq + 3m + 2 + 6) 
=(r,s) wherer+6=s (r—-s=-6) 
= — 6. 

The set J+ of natural numbers is extended to the wider set J of 
integers; amongst the elements of J are the positive integers, the 
equivalents of the natural numbers of J+. It follows quite easily 
from the definition that J satisfies all the operational rules and order 
properties of 2.2, except that reciprocals are still lacking and that the 
order is not dense. The set J is an integral domain. 

(iii) Rational Numbers. One further step is needed to eliminate the 
lack of reciprocals in J. It is a rather general process, known as the 
definition of a quotient field. In this case, if p and q are integers of J, 
the quotient p/q is formed to provide the system of rationals, a field 
of quotients. At the same time, the definition makes good the 
deficiency in the order; the rationals have all the order properties, 


including that of density. 
In the definition of rational numbers, the idea is to write the ‘ratio’ 


1]. SOME FORMAL DEVELOPMENT 447 


or ‘quotient’ of one integer by another: p + gor p/g. To avoid question- 
begging, these terms and notations cannot be used in the definition. 
There is again the difficulty that many different pairs of integers 
have the same quotient, e.g. the rational number § is got equally 
from ==... . The safe definition, therefore, is to write a rational 
number as a pair of integers (m, n) with the convention added that 
(γι, n) =(p, 4) if the integers mg and np are equal.} 


DEFINITION: A rational number is the pair of integers (m, n), where 
(m, n)=(p, q) if mq=np, such that +, x and < are defined: 
(m, n) +(p, 4) = (mq + np, ng) 
(m, n) x (p, q) = (mp, nq) 
(m, n)<(p, 4) if mg<np for n and q positive. 
Then (m, n) 18 written . (n0), πὰ if mq=np and aa uf mq <np 
(n and q positive). 
The reason for adopting the particular definitions for + and x is 
4 nq 


a πτ To be definite on ‘less than’ (<), it can always be 
arranged that n and q are positive integers (in the denominators): 


clear in translation into the τ notation: and 


It follows easily from the definition that rational numbers satisfy 
all the operational rules and order properties of 2.2. The set R of 
rationals is an ordered field, containing within it the particular 


rationals = (= with n= 1) which are equivalent to the integers m. 


(iv) Real Numbers. The development of the number system into a 
complete ordered field is rounded off by the definition of the set R* 
of real numbers. The need for completion appears on splitting the set 
R of rationals into subsets ἢν and G so that each rational is in L or in 
G, each of L and G contains at least one rational, and each rational 
of L is less than each rational of G. This is called a Dedekind Cut, 


+ The condition is written, as it must be, in terms of integers only, i.e. mg =np. It 
will appear as m/n =p/q in the notation to be adopted. . 


448 SOME FORMAL DEVELOPMENT [15 


after Dedekind (1821-1916). There are three possibilities :+ (i) L has a 
greatest member a and G a least member ὃ; (ii) L has a greatest 
member a, or G a least member ὃ, but not both; (iii) L has no greatest 
member and @ no least. Case (i) can be eliminated. If it holds, then 
a<b by definition of the cut and so 4(a +5) is a rational which is in 
neither L nor G, a contradiction. The other two cases remain; 
examples are: (ii) ZL given as all rationals x<2 and G as all rationals 
x>2; (11) L given as all negative rationals and positive rationals x 
such that 2?<2 and G as all positive rationals x such that 22>2. 

Hence, if we say that a Dedekind cut defines a number (as the 
dividing point between LZ and 6), the number may correspond to a 
rational, but it may not. This suggests the definition of the wider 
set R* of real numbers: | 


DEFINITION: A Dedekind Cut of the set R of rationals into subsets 
L and G so that 


(1) each rational 1s in L or in G 
(2) each of L and G contains at least one rational, and 
(3) each rational of L is less than each rational of G 


defines a real number « as the point of division between L and G. 


Hence, R* includes some « corresponding to rational numbers; other 
« of R* do not so correspond and they are irrational real numbers. 
As an example of an irrational real number (apart from ./2 already 
quoted), consider the equation 2? -- 82 -- 8-Ξ- 0. If z=a is a rational 
root, then (x — a) is a factor of a — 3x -- 8 and a is an integer dividing 
8; the only possibilities are a=1, 2, 4 or 8 and none satisfies the 
equation. There is no rational root. A Dedekind Cut is given by 
a* — 3x -- 8Ξ- 0 (x in L) and “8 -- 3x-—8>0 (x in G) as can be seen by 
plotting a graph of y=x? -- 3x -- for rational x. The real number « 
so defined is an irrational root of x3 ~ 3x -- 8=0. 

From the definition it follows that real numbers do as well, 
algebraically, as the rationals ; they satisfy all the rules and properties 
of 2.2, making R* an ordered field. The important matter is the re- 
definition of +, x and <: 

t In case (i) there is a jump from L to G; in case (iii) a gap between L and G. A cut 
in the set of integers gives rise to case (i), i.e. integers have jumps. A cut in the set of 
rationals excludes (i) but (iii) is possible, i.e. rationals have gaps but no jumps. For 


real numbers, only case (ii) arises and there are neither jumps nor gaps. This is what 
we mean when we say that real numbers form a ‘continuum’. 


1] SOME FORMAL DEVELOPMENT 449 


Let «’ be a real number given by a Dedekind Cut with L’ contain- 
ing rationals a’ and @’ rationals b’, where a’ <b’. Let α΄΄ be another 
~ real number, the subsets of the Dedekind Cut being L” (rationals 
a’’) and G”’ (rationals b’’) where a’’ <b”. 


Addition of «’ and «’’: write a =a’ - α΄, ὃ --' +6” as sums of rationals. 
Then all a make up a subset D, and all ὃ a subset G. Since a <b, D and 
G give a Dedekind Cut and the real number so specified is defined as 
ατεα' - α΄΄. 
Multiplication of α' and α΄": if «’>0 and «’’>0 (i.e. G’ and G” consist 
of positive rationals only), then the Dedekind Cuts can be limited to 
cuts of the non-negative rationals and there is no difficulty on signs 
in multiplying rationals. Write a=a’ x a” making up a subset ἢ, and 
write b=b' xb” making up a subset G, of the non-negative rationals. 
Since a <b, L and G give a Dedekind Cut of the non-negative rationals 
and the real number which corresponds is defined as «=«’ x α΄. If 
α΄ <0 and/or α΄ <0, then a=a’ x «” is defined in terms of the corre- 
sponding non-negative numbers, e.g. «’«’’= -- α΄ -- α΄) if a’>0, 
a’ <0. 
Order of «’ and α΄΄: if «’ and «” are different (1.6. LD’ and L” are 
different subsets, as are G’ and 67), then a’ <b’ and a” <b” imply 
only two possibilities. Either all a’ of L’ are in L’”’ (and so all b” of 
ΟΘ΄ are in G’) in which case we define α΄ <a’; or all a’’ of L” are in 
L’ (and go all b’ of G’ are in G’’) in which case we define α΄΄ <a’. 
Once these re-definitions are made, it is a straight-forward but 
exceedingly tedious task to check that all the rules and properties of 
2.2 hold for real numbers. This task is not undertaken here. The 
important result is: 


TuEorEM: If a Dedekind Cut (L and G) is made of the set R* of real 
numbers, then it gives a unique real number « such that all real numbers 
a<aarein Land all real numbers b>« are in G, and such that « belongs 
either to L or to G. 


Proof: Let L’ and G’ be respectively the set of rationals in L and G, 
which are subsets of real numbers. There are two possibilities: 

(i) L’ has a greatest member, the rational a. L may still contain a 
real number x>a. Then the Dedekind Cut of rationals giving z must 
contain in its lower subset some rationals between a and x; a contra- 
diction. Hence L has no real member greater than a, i.e. L has a as 


450 SOME FORMAL DEVELOPMENT [15 


its greatest member. A similar situation arises if G’ has a least 
member, so that G has a least member, a rational ὃ. 

(ii) Z’ has no greatest and G’ no least member. Then the Dedekind 
Cut of rationals (comprising L’ and G’) gives a real number « which 
is irrational, and which belongs either to L or to G. If « belongs to 
L, it must be L’s greatest member; otherwise, there is a real x>« 
in LZ and so rationals between « and x belonging to L’, contradicting 
the condition that « is the real number given by L’ and G@’. Similarly, 
if « belongs to G, it must be G’s least member. It follows, in both 
cases (i) and (ii), that there is a real number (rational or irrational) 
which is either the greatest of L or the least of G. It must be unique; 
otherwise, if «’ and «’’ are one the greatest of LZ and the other the 
least of Θ᾿, then 4(«’+.«’’) is neither in Z nor in G, a contradiction. 

Q.E.D 

‘The implication of the theorem is most important. A Dedekind 
Cut of rational numbers gives either a rational or something new (an 
irrational). The same process applied to real numbers produces 
nothing new; a Dedekind Cut of real numbers always gives a real 
number, belonging to one or other of the two sets L and G of the Cut. 
The set R* of real numbers is complete, cut it which way we will. 
It follows that, if a set 8 of real numbers with a lower bound is given, 
then a Dedekind Cut can be specified: x belongs to L if it is a lower 
bound of S, otherwise x belongs to G. The real number so defined by 
the Cut is the GLB of S. As such, it may or may not belong to S. A 
similar result holds for a LUB. 


15.2. Polynomials: the fundamental theorem of algebra. (Reference: 
Chapter 3.) The essential idea of a polynomial is that it is a sequence 
of coefficients, drawn from a given field F. In writing polynomials 
with rational coefficients, as in 3.3, F is the field R of rationals. F 
could equally well be the field of real (or complex) numbers giving 
polynomials with real (or complex) coefficients.t 

DEFINITION: A polynomial over the field F is any sequence (fy, fi, fos 
fs, ...) of elements of F with only a finite number of non-zero terms, and 
ΠΝ to the rules of addition and multiplication : 


} It is also possible to take F as an integral domain, e.g. the set of integers, giving 
polynomials with integral coefficients. The particular expression (7) below does not 
then apply. 


97 SOME FORMAL DEVELOPMENT 451 


(fo: Sr Sos Ss ...}Ὲ (90» 91, Jas 95» +++) \ 
=(fot+9o fit Iu 9 Ἐ9...[5.:.σῳ ... 


(fo: Sir Ses Fs ...} X (Yo, 9.1» Ja Ja; ...) 
Ξε (hp, hy, he, hs, +++) 


where hin =SoGm + fiYm-1 + ... +hnGo 
As a convention, identify the polynomial ( to, 0, 0, 0, ...) with the 
element f, of F, e.g. with the rational f, when F is the field R. Then: 


(fos Sa» Sa Fs, se) 
=(fo, 0, 0, 0, ...) + (0, fy, 0, 0, ...)+(0, 0, fo, 0,...) +... by (1) 
=(fo, 0, 0, 0, ...) + (fi, 0, 0, 0, ...) x (0, 1, 0, 0, ...) 


+ (fo; 0, 0, 0, ...}) Χ (0, 0, i 0, sia) “Piece by (2) 
1.6. (fos Sa Sas fs» +++) 
=fot+fi(0, 1, 0, 0, ...)+f2(0, 0, 1, 0, ...) +... ννννννννννννενννον (3) 


Write x for the particular polynomial (0, 1, 0, 0, ...). Apart from the 
fact that it is one of the polynomials, we still leave x undefined. 
Then: 
x*—(0, 1, 0, 0, ...) x (0, 1, 0, 0, ...)=(0, 0, 1, 0, ...) by (2) 
x3 =(0, 0, 1, 0, ...) x (0, 1, 0, 0, ...)=(0, 0, 0, 1, 0, ...) 
and so on. Hence 2”, the product of x by itself (m times) is itself a 
polynomial, that with 1 in the mth place and zero elements elsewhere. 
Substituting in (3), we have the familiar notation for a polynomial: 


(Fos Sa» Sas Sas ---) =So thi + fem? + fat? +... cecececsceceees (4) 
Since there is only a finite number of non-zero elements among the 
f’s, there is a last non-zero element. Let it be f,, and stop the sequence 
here. This means that there can be zero values for any or all of 
to: Si» Se, --»fn-1, that f,~0, and that any subsequent elements can 
be ignored as all zero. Hence the sequence of coefficients fy, f1, fo, -.. Sn 
is obtained, for n>0 and f,40. The polynomial is then of nth degree 
and it can be written from (4): 
f(e)=fotfiet+fow?t+...+f,2" (50, 90) .........08. (5) 
The set of all polynomials (5), as n and the f’s vary, is denoted F[z]. 
A further development is possible if we agree to impose another 


rule on the set of polynomials. This defines scalar multiplication, i.e. 
multiplication of f(x) by any element λ of the field F: 


Nfs Fis as ἢ», .) πο ῦν» Aas has Mi es ««ο)ιννοννννννννννννο (6) 


452 SOME FORMAL DEVELOPMENT [15 
The rule (6), which is additional to (1) and (2), applies to (5): 
A(fo+ te + fou? +... + frye) = (Afo) + (Afra + (Afg)u2 +... + (Af, )a. 


Put = Ξε as a particular case, A being an element of F: 
1. Jo ἢ, ἢ fs 
—f(2) = Seat tatt... palit an 
kag ae aad 
1.6. g (%) =a" +4, τὐπ 1+... we tee Sb atetave sutras (7) 


where a, .» elements of F for r=0, 1, 92, ... "-- 1, and where 


Fn 


g (x) = r f (x). For most purposes g(x) is as good as f(x); both are 
polynomials and they differ only by a constant factor. Hence, (7) can 


usually be taken as the general expression of a polynomial of degree n. 
The rules (1) and (2), for sums and products, when applied to the 
general polynomial of form (5), become the familiar processes of 
elementary algebra: 
Tf f(x) =fot fie + for? +... + fae” and g(*) =o +91% + G20? +... + Jme™ 
then 
f(x) +9 (%) =(fo+9o) + (fi +91)e + (fo+g2)e? +... } (8) 
f(a) x 9 (a) =foGo + (fogs + 190}. + (foG2t+AiGs t+ FeGo)e? Ἑ...} 
The sum polynomial terminates with f,x” if n>m, with (f,+9,)a" if 
n=m or with g,,7™ if n<m. The product polynomial terminates with 
frImut™. From (8), it follows that polynomials f(x) of the set F[z] 
satisfy all the operational rules, except that reciprocals are not 
defined. F [x] is an integral domain. 

The lack of reciprocals can be rectified, as in 15.1 (iii), by con- 
structing the quotient field of F [x], i.e. the set of ratios of polynomials. 
f(x) 
g (x) 
{f(x), g (a)}, 9 (a2)£0, such that + and x are defined: 

{f(&), 9 (%)} + {9 (2), #(x)} = {Feb (2) +9 (2) 4 (2), g (eb (z)} 

{f (a), 9 ()} x {b (@), Hx _ tf (x)d (x), 9 (x) (x)}- 

The rational fraction (a) is identified as 1, i.e. f(x) cancels out. The 
f(x) . 


rational fraction ae is identified as f(x), i.e. polynomials are in- 


DeFINITION: A rational fraction is a pair of polynomials 


2] SOME FORMAL DEVELOPMENT 453 


cluded in the set of rational fractions. To distinguish it from the set 
F[x] of polynomials, the set of rational fractions is denoted F (x). 


The reciprocal of f(~) can now be written as ΓΞ For: 
= _F(%) 
F(x) Fp ={f(z), Ut SOY =F) Le} ae = 


The set F(x) of all rational fractions satisfies the whole system of 
operational rules of 2.2. While Fz] is only an integral domain, F (x) 
is a field. There is no question of ordering rational fractions. F (2) is 
not an ordered field. 
_ The method of extending a given field F into a wider field F (x) of 
rational fractions f(x)/g (x) is known as adjunction of x to the field F. 
Here z is some element outside the given field F. First form sums of 
products: 

f(e)=fotfirtfor?+...+f,x" (n>0), 
by taking x both with itself and with elements of 1’, subject to a re- 
definition of addition (+) and multiplication (x) as given by (8). 
The result is an integral domain of elements f (x). Next, write f (x)/g (x) 
where f(x) and g(x) are both of the form indicated, i.e. where 


7) fot fie δε. +fnX" ΓΤ 
G(X) Jot Gye + Gat? +... Ἔ θ mikm 
The set of elements (9) is a field, the adjunction of x to F. The 
notation f(x) for this field indicates the adjunction. The original 
field F has been expanded to a wider field F (x) by swallowing up x 
together with all the necessary sums, products and quotients. 
Adjunction is a very powerful tool for operating on fields. A simple 
example: take the field R of rationals and form the field R(./2) by 
adjunction of the outside element ,.2. Put x=,/2 in (9) and note that 


e®*=2, e°=2,/2, ct=4, .... The general element of R(,/2) is then of 
the form: 

a+b6/2_ 

τ 5 Ὁ: 


by multiplying numerator and denominator by c —d,/2. This is done 
in 2.3. 

A more Sapeeant example is to write 1 Ἐ (1) by adjunction of the 
element ὁ to the field R* of real numbers, where i? = — 1. Put x =7 in 


4δ4 SOME FORMAL DEVELOPMENT [15 


(9), noting that z?= —1, a= -1, xt=1,.... The general element of 
R* (2) is: 

a+wb 

oti Y 


by multiplying numerator and denominator by c—id. Hence, as 
indicated in 2.5, the field C of one numbers is obtained by 
adjunction of ἡ to R*. 

It remains to establish the Fundamental Theorem of Algebra that 
every polynomial f(x) of positive degree, over the field F of rational, 
real or complex numbers, is such that f(«)=0 for some complex 
number a. The consequence of this theorem (as shown in 3.7) is that 
a polynomial of degree n>0 has precisely n linear factors, and the 
polynomial equation has precisely n roots. The theorem is stated in 
3.7 for a polynomial with rational coefficients. It is equally true for 
polynomials with real or complex coefficients. 

THEOREM: If f(x) is any polynomial with rational, real or complex 
coefficients, and of positive degree, then f(«)=0 for some complex «. 
All known proofs of this theorem are framed in terms which involve 
topological concepts. It would appear to be incapable of proof in 
purely algebraic terms. The following is no more than a sketch of a 
proof, the details and refinements of the topological construction 
being omitted. 

Proof: Write f(x) =a" + d,-1X"-1+... +@,% - ἀρ, where n is the degree 
of the polynomial. Let «=re® be any com- 


P plex number, where r is the absolute value 
and @ the argument (12.7 above), repre- 
A. sented by a point P on an Argand Diagram. 


A Fig. 15.2 illustrates. As « varies, so do r 
and θ and P moves around the plane. We 
can get « to cover all complex values, and 
P to move over the whole plane, by first 
letting P describe an anti-clockwise circle 
(given r, @ increasing from 0 to 277) and 
then increasing the radius of the circle from 
O outwards (r increasing from 0). 

Write B=f(«), a function of a complex 
variable «. Given « =re*, then β is a specific 
Fig. 15.2 complex number B =pe*, where p is its ab- 


2, 3] SOME FORMAL DEVELOPMENT 455 


solute value and ¢ its argument, represented by a point P’ corres- 
ponding to P (Fig. 15.2). Then p and ¢ vary continuously with r and 0. 
As P describes some continuous curve, then P’ describes a continuous 
curve. The relation B =f(«) maps P onto P’, maps one curve onto the 
other. Let P describe an anti-clockwise circle of radius r and let C, 
be the corresponding curve described by P’. Write m for the net 
number of times Οὐ, goes round O anti-clockwise, net in the sense that 
a clockwise turn=a negative turn. So m depends on r, giving a 
continuous function m(r). 
When 7 =0, then «=0 and the ‘circle’ is the point O. But 
B =f (0) =a, 

i.e. the ‘curve’ C, (r=0) is the single point O’ (corresponding to 0). 
Hence m(0)=0. When r is large, P describes a large circle (« large). 
Then B =f(«)=a"=r"e"? approximately from the polynomial. As P 
goes round its circle (θ increasing from 0 to 27), P’ is given by 
B=re?, i.e. p=r" (fixed) and ¢=n8 (increasing from 0 to 2nz). 
Here, as P’ describes C,, it goes n times round O. So m(r)—>n as 
r—>oo. The curve C, starts as a single point (r=0) and finishes by 
winding itself » times round O (r->oo). At some r (and for some 8), 
ὦ, must go through O. When it does, 8=0. The value of « which 
corresponds is such that f(«)=0. Q.E.D. 


15.3. Sets, groups, fields and vector spaces. (Reference: Chapters 4, 
6 and 8.) 

(i) Set Theory. The axiomatic development of the theory of sets, 
as a foundation for all mathematics, can be achieved in various ways. 
The following is based on Zermelo’s system of axioms. The primitive 
(undefined) concepts are those of a ‘set’ and of the relation ‘belongs to’. 
The notation a ε A indicates that the object a belongs to the set A. 
As consequent definitions: 

(2) A isa subset of B, denoted ACB, ifa ε 4 implies ae B for alla. 

(δ) A equals B, denoted A=B, if ACB and BCA both hold. 
Further, A is a proper subset of B if AC B holds and if A = B does not. 
The following five axioms are laid down: 
(1) Members: a set is fully determined by its members so that, if a 
and 6 are equal (identical) objects, then a ε A implies ὃ ε A. 

(2) Pairing: if a and ὃ are different objects, then there exists a 

set {a, δ) comprising just a and b, and no more. 


456 SOME FORMAL DEVELOPMENT [15 


(3) Selection: if the property JT is meaningful for members of a set 
A, then there exists a subset of A comprising just those 
members of A for which JJ holds, and no more. 

(4) Sum Set: if the set X comprises the sets A, B, C, ..., then there 
exists a sum set SX=A+B+C+... comprising just the 
members of A, B, C, ..., and no more. 

(5) Power Set: if the set X has subsets A, B,C,..., then there exists — 
a power set PX ={A, B,C, ...} comprising all subsets of X. 

Axiom (1) ensures that a set A can be denoted by its members 
{a, ὃ, c, ...}, where the order of writing the members is immaterial 
and where the number of members need not be finite. Axiom (2), in 
which the ‘objects’ a and 6 can themselves be sets, is designed to 
build up sets from individual objects or from smaller sets. 

A property is ‘meaningful’ if it is true or false (and not irrelevant) 
for members of A; for example the property ‘a is even’ is meaningful 
for the set of natural numbers but not for a set of persons. Axiom (4) 
relates to sets A, B, C,... which may be single objects as well as 
finite or infinite sets. If A, B, C,... are all the sets formed from the 
totality of elements in a universal set U, then {A, B,C, ...} exists as 
the power set of U by axiom (5). | 

A sequence of further definitions is obtained from the axioms. The 
nul set d is the set comprising no member and the wnit set {a} of an 
object a is the set comprising just a and no more. The existence of 
these sets is guaranteed by Axioms (2) and (3). Ifa and ὃ are different 
objects, write S = {a, ὃ) as the pair set and take property 11 as ‘not a 
member of 8’, giving a subset of S which is the nul set. If /7 is taken 
as ‘different from b’, it gives a subset of S which is the unit set {a}. 

Given two sets A and B, the union A ὦ B is the set comprising 
just those members belonging either to A or to B. The union exists 
in virtue of Axiom (4). The intersection A γιὰ B is the set comprising 
just those members belonging both to A and to B. Its existence is 
guaranteed by Axiom (3) with J] as ‘a ε B’ taken over all members 
a of A. The Cartesian product A.B is the set comprising all pairs 
(a, b) where a € A and ὃ ε B. It exists because of Axiom (5), with the 
power set P(A ὦ B) formed from the union A ὦ B. Take I] as 
‘SA and SB are each unit sets’ and apply to members S of 
P(A ὦ B). Axiom (3) then gives A.B as a subset of P(A ᾧ B). 
In other words, A ὦ B has a variety of subsets S and JT picks out 


9] SOME FORMAL DEVELOPMENT 457 


those consisting of two elements, one from A (given by S 7 A) and 
one from B (given by S ἡ B). 

Consider {A, B,C, ...} as the power set of the universal set U. 
One member is the nul set φ. In addition to the union A ὦ B and the 
untersection Ao B for any pair A and B, there can be defined the 
complement A’ of any set A, comprising just those members of U not 
belonging to A. A rather complete set of properties is: 


Union (U) Intersection (1) 


Closure AUBEU AnBevU 

Associative AU(BUC)=(A UB)UC AN(BNC)=(AN BNC 

Commutative AUB=BUA AN B=BN A 

Idempotent Aas “ἢ ay 

Βοσθηδδ;.. 4UU=0 At $=¢ 

Distributive AN(BUC)=(4Nn B)U(AN C) AU(BN C)=(A U B)N (4 UC) 
AU A’= A A’=¢ 


iS 
ἢ 


Involution 


The proofs are straight-forward and they are assisted by drawing 
Venn Diagrams. The distributive rule 

AuU(BnQ)=(A ὦ B)n(A ᾧ ΟἹ 
is illustrated in Fig. 15.3. The total shaded area in the top diagram 
represents the left-hand set ; the cross shaded 
area in the bottom diagram represents the 
right-hand set. The two areas are the same. 

A finite set is a set which can be put into 
one-one correspondence with {1, 2, 3, ... } 
for some natural number ἢ. Any other set 
is an infinite set. There is nothing in the five 
axioms above to guarantee the existence of 
any infinite set. This requires an additional 
axiom : 

(6) Infinite sets: there is a set, which is 
infinite, comprising all natural numbers. 
Other infinite sets can be built up once this 
particular set is known. 

A set is reflexive if it can be put into one- 
one correspondence with a proper subset of 
itself. To establish that any infinite set is auB== auc|||lll| 
reflexive requires a further additional axiom. Fie. 15.3 


458 SOME FORMAL DEVELOPMENT [15 


This relates to the Cartesian product, extended from the case of 
A.B for two sets to that of A.B.C.... for any set. of sets 
{A, B,C, ...}. For example, with three sets, A .B.C is the set of 
all triples (a, b,c) where ae A, be Band ceC. The axiom is: 

(7) Axiom of Choice: if the sets A, B,C, ... are each non-empty 
and pairwise disjoint (no common element), then the Cartesian 
product A.B.C.... is non-empty. 

It can be shown that the axiom guarantees that, given a set X and 
hence its power set PX ={A, B, C,...} comprising all subsets, a 
definite element of each non-empty A, B, C, ... can be picked out. In 
its turn, this guarantees that a countably infinite sequence 2,, 7%, 1:3» .. . 
can be selected from any infinite set X. The successive subsets of PX 
taken for the selection of definite elements are X itself, X with x, 
removed, and so on. It then follows that all infinite sets are reflexive. 

Algebra is largely concerned with sets which have the structure 
of a group. Development proceeds in two directions, to specialised 
sets with the structure of a ring and to other specialised sets with the 
structure of a vector space. These classifications are not exclusive; 
for example, a field is a special form of a ring and it may have 
properties which qualify it as a vector space. 

(ii) Groups. With an operation *, the definition of a group is: 


DEFINITION: A set G is a group if, for all elements a, b,c, ...: 

(1) Closure: ax beG 

(2) Associative: a * (Ὁ * c)=(a * b) *¢ 

(3) Identity: there is ane ε G 80 thate xa=a 

(4) Inverse: there is an a1 € G 80 that α΄“ x a=e. 
The following properties are established from the definition: 

(a) If a-1 is an inverse of a, so that α΄“ « a=e, then a * a~1=e. 
For: 

(ὃ x α΄ Ὁ) x (a κατ ἢ =) * (a1 # a) *a-1=b xe xa! by (2) and (4) 

=bea* by (3). 

+ The seven axioms, including the Axiom of Choice, can be shown to be consistent 
one with another. They are not necessarily complete for all types of sets used in set 
theory. On the axiomatic basis of set theory, see Fraenkel: Abstract Set Theory (North- 


Holland, 1953) and Fraenkel and Bar-Hillel: Foundations of Set Theory (North- 
Holland, 1958). 


3] SOME FORMAL DEVELOPMENT . 4659 


Write ὃ as an inverse of a—!, so that ὃ « a~!=e by (4). 
Then: 6 κία *a-1)=e 
i.e. | axaixe by (3). Q.E.D. 
(Ὁ) If ὁ is an identity, so that e * a=a, then a * e=a. ? 
For: axe=—ax(a1tx*a)=(axa-!)*xa _ by (4) and (2) 
=e@*a=a by (a2) Q.E.D. 


(c) Ifa+*b=axc, then b=c. 
For: from the given data, a-1 ε (a κ b)=a-! * (a Ὁ 0). 
Now: α΄“ (a*6b)=(a-!*a)«xb=e*xb=b by (2), (4) and (3). 
Similarly: a-! « (a * c)=c. Hence, b=c. Q.E.D. 
(4) The identity e is unique, and so is the inverse a—! of a. 


For: if there are two identities e and e’, then 


axe=axe’ by (δ) 
i.e. exe’ by (c). 
The uniqueness of a! is proved similarly. Q.E.D. 


Note : these four properties complete the table of the operational rules 
for groups, given in 6.2. 


(6) (a « b)-2=b-2 καὶ α- 


For: (δ-ἰ| * α΄) κ (a * δ) κεδ-κ (a-1 # a) eb =b-1 ee ed 


| =b-1xb=e 
1.6. b-1 κα΄ is the inverse of a « ὁ. Q.E.D. 
(f) If δ. x=a, then x=b- +a; if y x b=a, then y=a « b-!. 
For: ὃ x (Ὁ- κα a)=(b * 6-1) ea =e xa=a=b αὶ Χ. 
So: bt#b# (bo *a)=b1ebxx 
1.0. ex(b κα) το κα 
1.0. b-lxa=x. 
The other result follows similarly. Q.E.D. 


Note: these last two properties show how careful we must be in the 
general case when the group is not (necessarily) commutative. For 
example: (a « b)-1A4a-! « 6-1. If the group is commutative then, every- 
thing is simpler: 


(a*b)-t=a-1%b-! and ax*b-=b-! xa, 


460 SOME FORMAL DEVELOPMENT [15 


The second, for a commutative group under +, provides the definition 
of a unique difference: a—-b=a+(-—b)=(-—6)+a. Similarly, for a 
commutative group under x, it gives the definition of a unique 
quotient: eae “τ. x a. 
b δ ὃ 

(1) Rongs. As a first development on the basis of a group, consider 
sets of double composition in which two binary operations are de- 
fined: sums and products. A general case of such sets of double 
composition is the ring: 

DEFINITION: A set R={a, b,c, ...} is a ring if the two operations 
(+ and x) are such that: 

(1) the elements form a commutative group (R+) under addition 

(2) the non-zero elemenis as a set (ἢ x ) under multiplication satisfy : 

Closure: abe R Associative : a (bc) =(ab)c 
(3) addition and multiplication are connected by the distributive rules: 
a(b+c)=ab+ac and (a+b)c=ac+be. 

The implications are that R is fully obedient under addition, that 
multiplication has only a minimum of properties (closure and 
associative) and that multiplication is distributive over addition. 
Other properties (commutative, unity and reciprocals) of products 
are not assumed. It is not even assumed that cancellation (if ab =ac, 
for a#0, then b=c) is a valid procedure. A ring may have zero 
divisors, i.e. non-zero a and ὃ such that ab=0. This leaves it open to 
specify more particular, i.e. more specialised, kinds of rings. For 
example, there is the commutative ring, as opposed to the non- 
commutative type, for which ab=ba. Increasing the specialisation 
by adding further multiplicative properties, we get to a penultimate 
stage with the integral domain: a commutative ring with identity 
(unity) for which cancellation holds for products. All that this kind 
of ring lacks is a reciprocal for each element. The set J of integers is 
an example. The last, or completely obedient, stage is reached with 
a field: a commutative ring with identity (unity) and reciprocals. 

(iv) Fields. The end of the development of sets of double com- 
position (under + and x) is the field, a set which comprehends in 
itself two commutative groups, one for sums and the other for pro- 
ducts, linked by the distributive rule. A direct and economical 
definition of a field can be given as an alternative: 


3] SOME FORMAL DEVELOPMENT 461 


DEFINITION: A set Κ ={a, b,c, ...} is a field if it has at least two 
distinct elements and if operations of + and x are defined so that: 


(1) they are closed, associative and commutative, and connected by 
the distributive rule: α (ὃ - 6) -- αὖ - ας 


(2) there is always a solution of the equations (x ε F): 
t+a=b and ax=b foranyaandbeF 
except that α τὸ Ο must be imposed for the second equation. 


The requirement of two distinct elements ensures that there is some- 
thing to act as zero for addition, and something different for unity in 
multiplication. Conditions (2) ensure that linear equations can be 
solved within a field. In effect, x +a=b gives the difference ὃ — a and 
ax =b gives the quotient τ: 

From the definition, it is easily established that the set of all 
elements of F is a commutative group under + and that the set of 
all non-zero elements is a commutative group under x, i.e. that F 
is the most specialised of rings. Checking back at the economical 
definition of a group in (ii) above, we find that all we need to prove 
is that 6 exists for e x a=a and that a- exists for a-! * a=e, both 
when + is + and when + is x. All the rest follows, e.g. that ὁ is unique 
in both cases (0 for sums and 1 for products). The proof for multi- 
plication is as follows: There is always a solution of az=b, i.e. for 
xa=b (commutative property). Define e as a solution of 2b =), for 
some given ὦ: eb=b. Now let a be any member of F. Then =z exists 
for bz =a and so: ea =ebz =ba =a. Hence 6 is such that ea =a for any 
a, 1.6. 6 is the identity required. Further, consider the equation 
xa=e. Write the solution z=a—!, so that a-1a=e, and a- is the 
reciprocal of a required. 

(v) Vector Spaces. The double composition of a ring or field is 
achieved by supplementing an additive group with the second 
operation of multiplication. An additive group can be made into a 
system of double composition in a different way, the second operation 
being a new one: scalar multiplication. The result is a vector space 
V, comprising a set {w, v, w, ...} of elements called vectors. To write 
scalar products of vectors in V, we need another set, i.e. a set of 
scalars. This is an outside set {a, b,c, ...}, different in nature from 
V. We take it to be a field and denote it by F. Scalar multiplication 


462 SOME FORMAL DEVELOPMENT [15 


is then an operation which takes a vector u from the main set V and 
a scalar a from the outside field F, and which gives the product au 
as a vector of ΚΓ. 


DEFINITION: V ={u, v, w,...} is a vector space over the field 
F ={a, b,c, ...} of scalars if the two operations of addition of vectors 
and of the product of a vector by a scalar are such that: 


(1) V is a commutative growp under addition 
(2) scalar multiplication is closed (au € V) with properties : 
Associative: a(bu) =(ab)u 
Distributive: a(u+v)=au + av 
and (a+b)u=au + bu 
and the unit scalar (1) is such that lu=u. 


Notice that a vector space does not need the operation of multiplioa- 
tion of vectors, just as a ring of field does not need scalar products. 
Each system has two operations: sums and scalar products in one 
case, sums and products in the other. The following is the simplest 
example of a vector space. 

Write the vector v =(z, y), a pair of real numbers. Define addition: 


Vy + V_=(Xy, Yr) + (Le, Yo) = (Ly + Xa, Yr + Yo)- 

Call the field of real numbers /’, the set of scalars used. Define scalar 
products: av=a(x, y)=(ax, ay). Then the set of all number pairs v 
is a vector space V over F. The vectors may be shown as points ἢ 
or vectors OP in a plane, or as complex numbers on an Argand 
diagram. In any case, the difference between V and fF is clear: V 
has elements which are number pairs (e.g. points in a plane) and ἢ 
comprises real numbers. The example is easily generalised to a set of 
n-tuples of numbers. 


15.4. Limits and continuity. (Reference: Chapter 9.) A general 
definition of a limit process, applicable in all cases, is given first, 
followed by particular applications of it. 

(i) Limit Processes. A limit process depends on the initial spect- 
fication of a set of stages through which the process is to run. 

DeFinition: A set of stages {M, N, P, ...} 1s a set with the property 
that some stages are related M>WN (meaning M more advanced than, 
or contained in, N) where the relation > satisfies the conditions: __ 


4] SOME FORMAL DEVELOPMENT 463 
(i) Transitivity: if U>N and N>P, then M>P 


(ii) Extension: if M and N are any stages, then there is a stage P 
such that P>M and P>N. 


The relation > is akin to an ordering (see the properties of order in 
2.2 and 7.8) but the stages are not necessarily completely ordered, 
i.e. the stages are not necessarily contained one within the other as a 
set of nesting intervals. The import of property (ii) is that stages M 
and NV can overlap, in which case there is another (and more advanced) 
stage P in the overlap. A typical set of stages is a set of neighbour- 
hoods N of a fixed point x =a in an interval [a, 8]. The neighbourhoods 
can overlap in all kinds of ways but a sequence of nesting intervals 
can be picked out, each more advanced than (contained in) another. 


DerrFinition: A limit process exists if there is a mapping of a set of 
stages N onto a set of intervals F (N) such that F(M) is contained in 
(Νὴ whenever ΜῈΝ. The final residue of the limit process is the 
interval F which is the intersection of all F(N). The limit process is 
convergent to L if F consists of the single element L: Lam F(N)=L. 


The intervals referred to are closed, i.e. [a, b] is the set of x such that 
a<x<b. The definition states that F is an interval; this needs to be 
proved. 


Proof: Any two intervals F(M) and F(N) must overlap and contain 
a third F(P). For, there is a stage P such that P>M, implying 
F(P) contained in F(M), and such that P>N, implying F'(P) 
contained in 2" (Δ). Let F (N)=[cy, d,] overlap with F (M) =[Cn, Tn). 
Then ὁ, <d,, 80 that the set of c, (all x) has an upper bound and so 
a LUB c. Similarly the set of ὦ, (all n) has a GLB d. Since c, <d, all 
n, C<d defining an interval [c, d]. This may be a single element (case 
c=d), or a finite interval (case c<d). Now, if A is in [e, 4], then 
σις λεαά Cad, all n, ie. A is in all F(N). Conversely, if λ is in all 
δ΄ (δὴ, then c,<A<d, all n, 1.6. A is an upper bound of 6, and c<a. 
Sunilarly A<d. Hence λ is in [c, d]. So F is the interval [c, (1. 
Q.E.D 
There are various equivalent expressions of convergence: 
ΤΗΒΟΒΕΜ: Κ΄ (Δ) converges to L if and only if: 


(1) for a given positive « (however small), there is a stage N so that 
L-e<c,<L +e for all c, of F(N) 


464 SOME FORMAL DEVELOPMENT [15 


(2) for a given L' - L (however close), there 1s a stage N so that F (N) 
does not contain L’ 


(3) for a given neighbourhood N, of L (however small), there 1s a 
stage N so that F (N) ts contained in N,. 


Proof: Only (1) need be established; the others follow from it. 
Directly: suppose F'(N) converges to L and take any positive ε. 
Then there is a stage M such that ἢ -- ε is not in F'(M) and a stage 
P such that L +e is not in F(P). Take N more advanced than M 
and P. N>WM implies F(N) contained in F(M) and N>P implies 
F(M) contained in F(P). Hence N is the required stage so that 
F(N) excludes both L~e and L+e«. Conversely: suppose N exists 
for Γ΄ (ΝῚ to exclude ἢ, --ε and ἢ τε, for a given ε. Take L’ + Z and 
write «=| L-L’'|. For this ε, let N be the stage so that F(N) 
excludes Le, 1.6. excludes L’. Hence the final residue Ff excludes 
L', which is any element # L. So F contains L only and 


Lim F(N)=L. Q.E.D. 
Ν 
(ii) Limits of a Function of a Real Variable. Consider Lim f(x) 


for a function f(x) defined on the domain X, where X need not 
include the given point z=« but must include some points in each 
neighbourhood of «. Take as stages N the set of neighbourhoods of «, 
satisfying the required conditions for a set of stages. If f(x) is bounded 
in each N, there is a smallest interval F(N) containing all f(x) for 
ΕΝ. There is a mapping of N onto F(N) and, by the definition of 
F(N), it follows that F(M) is contained in F(N) wherever M>N 
(M contained in NV). Hence, if f(a) is bounded in each N, there is a 
limit process over stages N. Then: 


Deriniti0n: If f(x) is bownded and the limit process F (N) converges 
to L over the stages N, then Lim f(x)=L. Necessary and sufficient 
conditions are that, for any positive « (however small), there ts a neigh- 
bourhood [a,, b,] of « so that 
| L-e<f(z)<L+e forall x in a,<v7<b,. 

The stated conditions are those of the Theorem of (i). The following 
properties follow. If f(x) and g(x) have limits as r->«, then: 


Lim (f(w) £9 (@)}= Lim f(x) + Lim (2); 


4) | SOME FORMAL DEVELOPMENT 465 
Lim {f(x) x g(x)}= Lim f(x) x Lim g (x); 


Lim” = Lim f(z) / Lim g(x) if Lim g(x)+0. 
ta 9 (x) ta t—a 2a 
Proof: take the case of a sum to illustrate. Let N’ be stages for 
f(2)—+L’ as x->« i.e. the smallest intervals F (NV’) =[c,’, d,’] converge 
to L’. Let N”’ be stages for g(x) >L” as x->«, i.e. the smallest inter- 
vals G(N’’)=[c,”’, ἃ, converge to L’’. Take N as the stages com- 
mon to Δ΄ and N”’. Since f(x) and g(x) are bounded in each N , 80 is 
J (x) +g (z), and there is a least interval H (1) containing all f(a) +g (x) 
in NV. But Η (Νὴ is contained in [c,’ +n", dy’ +d,,'"] which converges 
to L'+L’". So H(N) converges to L’ +L’ and this is the limit of 
f(x) +9 (x) as 2c. Q.E.D. 
The same definition and properties apply to Lim f(z)=Z. It is 


only necessary to re-specify the stages NV so that Ν is the set of all 
«><, for a given x, (however large). Lim f(z) follows similarly for 
r—>— 0 


stages N, where WN is the set of all x<z, for a given 2x, which is 
negative (however large). 

(ili) Continuity of a Function of a Real Variable. Consider a function 
f(x) defined on the domain X and let x=« be particular point of X. 


DEFINITION: f(x) is continuous at x = « of (1) Lim f(x) exists, (2) 
F (a) is defined, and (3) Lim f(x) =f («). “ 
It follows from the roperiies of limits in (ii): 
If f(x) and g(x) are continuous at x=«, then so are S(z)+9 (2), 
Ff (%) x g(x), and f(x)/g (x), provided that g(«)#0 in the last case. 


It is important to distinguish between the case where f(x) is con- 
tinuous at one value x=a and that where f(%) is continuous for all 
«eX. If f(x) is continuous at «=a, then it is locally bounded, i.e. 
bounded over some neighbourhood N of «. If f(x) is continuous over 
X, then it is locally bounded everywhere in X. The most important 
result arises when f(x) is defined and continuous over an enterval, 
as a particular case of the domain X: 


THEoREM: If f(x) is continuous over the interval αξίας, then: 
(1) f(x) 1s bounded in the interval 


(2) the range of f(x) is itself an interval: c< f(x)<d 


466 SOME FORMAL DEVELOPMENT [15 


(3) f(a) attains its GLB c=f(«) at some α (a<a<6) and attains its 
LUB d=f(B) at some B (a<B<6). | 

Proof: (1) Let 8 be the set of A such that f(x) has an upper bound in 
[a, A], where a<A<Ob. S is not empty (A=a at least) and it has an 
upper bound A<b. Hence S has a LUB, say μ. But f(x) is continuous 
at  (a<p<b) and there is a neighbourhood of μ in which f(x) is 
bounded. Hence f(a) has an upper bound in the combined interval, 
which can be written (a, &], made up of the intersection of [a, μ] and 
the neighbourhood of μ. Since k>p (LUB of 5), this is impossible 
except when k is outside the interval [a,b]. So, f(z) has an upper 

bound in [α, b]¢ [a, k]. Similarly f(x) has a lower bound in [a, δ]. 
Q.E.D. 

(2) Write c as the GLB and d as the LUB of f(x) over the interval 
[a, b]. We have to prove that, if β is any value in the interval 
c<p<d, then there is an «(a<a<b) such that f(«)=8. If f(mM=Bf, 
there is nothing to prove. Otherwise, suppose f(a) <f. (A similar 
argument holds if f(a) >.) Consider the set S of \ such that / (7) has 
an upper bound <f over [a, A], where a<A<b. Let « be the LUB 
of λ in S and suppose f(«)>f. Then continuity of f(x) at « gives a 
neighbourhood of « with f(z)>B and this neighbourhood must 
overlap with S where f(x)<f. This is a contradiction, 80 that 
f(a). Now suppose f(«)<f, so that there is a neighbourhood of « 
also with f(x) <P. Form the combined interval [a, k] of the interval 
[a, «] and the neighbourhood of α, so that f(x) has an upper bound 
<B over ἴα, k]. Again, as in the proof of (1), & must lie outside 
[a, ὃ]. Hence f(z) has an upper bound <f in [a, δ], contained in 
(a, k]. This contradicts the facts that the LUB of f(z) is ὦ and that 
β is set so that B<d. Hence f(«)<€B. The only possibility is f(«) =. 

Q.E.D. 

(3) follows at once from the proof of (2). 

Consequently, if f(x) is continuous over an interval, then both the 
domain and the range of the function are intervals. The result (3) is 
often called Weierstrass’ Maximum Theorem after Weierstrass 
(1815-97). It ensures that f(x) attains its LUB d at some point of the 
interval (a<a<b) and this LUB might be called the maximum of 
f(x) in the interval. However, it is customary to limit the term to a 
local maximum (11.2). The LUB is better termed a supremum of 
f(x) in the interval; it is a maximum of maxima (of which there may 


4, δ] SOME FORMAL DEVELOPMENT | τς 467 


be several). Hence, Weierstrass’ Theorem states that Sup f(x) =d, 
μη 


attained at some point of the interval (a<xz<b). Similar remarks 
apply to the GLB c, the minimum of minima of f(x) in the interval. 

(iv) Derivatives. Consider a function f(x) defined on the interval 
[a, 6] which contains x =« as an interior point. Then: 


DEFINITION: The derivative of f(x) at x=« is: 
f'(«)= Lim EL) uf the limit exists. 
—& 


Write F (x) Jae so that F(x) is not defined at z= .«. It is s still 


valid, however, " πρός the limit of F (x) as x—>«. If the limit exists, 


it is f’(«); if the limit does not exist, neither does f’(«). A necessary 
condition is: 


THEOREM: If f’(x) exists, then f(x) is continuous at x =a. 


Proof: Write $(x)=F (x) (x—a)+f(«)=f(x) except at x=a, where 
F (x) is written above. Though d(x) is not defined at x=«, the limit 
does exist: 


P(x) f(a) x 0+ f(a) =f(«) 


as x—>a. Here, since ¢(x)=f(x) (x #«), f(x) f(a) as x->a, 1.6. f(z) is 
continuous at x=a. Q.E.D. 


15.5. Integrals: the fundamental theorem of the calculus. (Reference: 
Chapter 10.) The function f(x) is defined on the interval [a, ὃ] where 
a «ὃ, and it is bounded on the interval. A partition P of [a, ὃ] is any 
set of n segments (n any positive integer) with dividing points 
ὦ Ξε» 351; Lg, ... T,=b. A refinement P’ of P is another partition 
which includes 2, 2.» 22» ...%, among its dividing points. Write 
P’>P when P’ is a refinement of (more advanced than) P. Then the 
partitions P are stages of a limit process (15.4) since they have both 
the necessary properties of transitivity and extension. The first of 
these is obvious and for the second, if P, and P, are any two parti- 
tions, take P as including all the dividing points of both to ensure 
that P>P, and P>P,. 


Form the sum Σ f(z,’)(x,—2,_,) over P, where 2,’ (r=1, 2, ... n) 
r=] 


Q A.B.M. 


468 SOME FORMAL DEVELOPMENT [1ὅ 


is any selected point in the rth segment (5... «“,’ «(«,). Since f(x) is 
bounded, it has a GLB ZL, and a LUB G, in the rth segment: 


Γι}, γα, 


So the sum is contained in the interval 


and this is the smallest such interval. The interval depends on P but 
not on the choice of x,’. Hence: 


Derrinition: The Riemann Sum for f(x) and the partition P of 
[a, ὃ] ws: 


7 (P) > bez — Hy-1), Σ Gua, = z,-1) | 


b 
and this interval is denoted S f(x)Ax over ἢ. 


The Riemann Sum is the smallest interval containing 
LS (es) (x, ~ %r—1) 


for all choices of x,’ in a given partition P. It is named after Riemann 
(1826-66). Among the properties which follow immediately from the 
definition, note: 


(1) If L is the GLB and G the LUB of f(z) in [a, 6], then g f (x)Ax 

over P is contained in the interval [0 (Ὁ —a), G(b -- α)] for all P. 

(2) 8 [(α)δα over P —§ fla)Ax over P, +8 f(a)de over P, (a<c <6) 
for two parts P, ‘na P, of P with Aiding point c. 

The meaning of the second property is that the lower point (upper 


b 
point) of the interval S is the sum of the lower points (upper points) 


ὃ 
of the intervals 5 and ΚΑ. 


From the definition, it also follows that, if P’>P, then the interval 
F(P’) is contained in the interval /'(P). Proof: suppose that the 
segment (7, — 2,1) of P is split by z into two segments of P’: ( ~ x,_) 
and (x, —). If LZ,’ and L,”’ are the GLB of f(z) in these two segments, 
then L,’>L,, L,'’ >L,. The contribution of the segment to the lower 


5] SOME FORMAL DEVELOPMENT 469 


end of F(P) is L,(x,—2,_,); the contribution to the lower end of 
F(P’) is L, (@-2,-;)+ 1, (α, -£)S>L,(x,-—%,-1) i.e. the lower end 
of F (P’)> that of F(P). Similarly, the upper end of F'(P’)< that 
of F'(P). This extends to any subdivision of P. Q.E.D. 

Hence there is a mapping of the stages P onto the intervals F (P) 
such that F'(P’) is contained in F'(P) whenever P’>P, i.e. a limit 
process (15.4) | 


DeEFIniTIon: If f(x) 1s bounded on [a, δ], where a <b, the Riemann 
Integral is the limit of F(P) as P advances, 1} the limit exists : 


\f (5) dx= Lim g f(a)da over P. 
P 4 


If the limit process F'(P) does not converge, the integral does not 
exist. From the properties (1) and (2) of Riemann Sums, it follows: 


(i) If Z is the GLB and G the LUB of f(z) in [a, δ], then: 
ὃ 
7, (ὃ -- α) τί [χω ἄχ Ξι ΘΟ (ὃ -- αὐ). 


ὃ 6 ὃ 
(ii) If a<c<b, then [fe de =| f(x) dx +| f (x) da. 


In both cases, it is assumed that the integral of f(x) exists. 
It remains to establish the Fundamental Theorem of the Calculus. 
In its simplest, if not quite its strongest, form: 


THEoreEm: If f(a) is continuous on the interval [a, ὃ], then at each x 
of [a,b]: F(x)=| f(u) du exists, is continuous and has derivative 


Τ' (x) =f (2). 
Proof: Let x and «+h be points in [a, b]. Since f(z) is continuous, 
f(x+h)—f (x) as h-0, which can be expressed as follows, by (3) of 
the Theorem of 15.4 (i) above. Given a neighbourhood N of f(z), 
there exists a neighbourhood of x for which the range of f(x) is con- 
tained in N. Further, by the Theorem of 15.4 (iii) (Weierstrass’ 
Maximum Theorem), f(x) is bounded and attains its bounds in any 
interval within [a, δ]. Let c be the GLB and d the LUB of f(z) in the 
interval [z,7+h]. Then, given JN, there is an h such that [c, d] is 
contained in JN. 

Since f(z) is bounded, Riemann Sums exist for a partition P of 
[a,b], and for the constituent parts of P separately. Consider 


470 SOME FORMAL DEVELOPMENT [15 


z ath ath x 
5. [()άχ, Sf(x)4z and S f(x)4x. From the limit process, S f(x)Ax 


as an interval for various P has a final residue which (as shown in 
15.4) is itself an interval. Write the final residue, not dependent on 
P, as (L(x), G,(x)] which indicates that it depends both on the fixed 
lower point a and on the variable upper point x. Similarly, the final 


ath 
residue of S f(x)4z over P is the interval [L,(« +h), G,(z+h)], and 


that of "Sf (x)Ax is [L,(a +h), G(x +h). 

The peaaon: now, is that [Z,(a+h), G,(*%+h)] as final residue is 
contained in the interval “Ὁ f(x)4xz over P, and this is contained in 
[ce(at+th—x),d(x+h-x)] Zieh dh] by property (1) above. Hence: 


[= os + a τὰ + a 


is contained in [c, 6] i.e. in NV (given). But, by property (2) above, 
ath tth x 
S f(x)Aa= S f(x)dx -—S f(x)dz. 
x a a 
This means that 
L,(x+h)=L,(a+h)-L,(v) and G,(x+h)=G,(x+h) -G,(x). 


L(x +h) = L(x) G(x +h) ie G(x) 
ee 


This is true, given J, for any sufficiently small h (positive or negative). 


is contained in N. 


So: 


A limit process is now set up. As NV, any neighbourhood of f(z), 
contracts on f(x), h gets smaller and the function L,(r) has a 
derivative: 


Hon) Τα Lalt +h) - (Ὁ) 4, 
Πῳ () =Lim τ δ = ol F(a), 
Similarly : G,'(%) = Lim “s(@ ἘΠ) - Cole) =f (x). 


Hence G(x) — L,(x) has zero derivative, i.e. G(x) — L,(”) =constant, 
which must be zero since G,(a) = L,(a) =0. So G(x) = L,(x) and each 
has derivative f(x). Write the common value F(x), such that 


F(x) =f(z). But the final residue of 8 f(x)4e is [L,(z), G,(z)], now 


5, 6] SOME FORMAL DEVELOPMENT 471 


shown to be a single value F(x). Hence | f(x) dx exists; it is F(x) 
where F (x) is continuous with derivative 1 (5) =f (2). Q.E.D. 


15.6. Absolute and uniform convergence. (Reference: Chapter 11.) 
An infinite series Su, is given, with no restriction on the sign of wp. 


DEFINITION: Sw, is absolutely convergent if Σ | u, | 1s convergent. 
The following result, called Dirichlet’s Theorem after Dirichlet 
(1805-59), is derived from the definition: 


THEOREM: The sum of an absolutely convergent series does not depend 
on the order taken for the terms of the series. 


Proof: given Su,, absolutely convergent, write Xv, and Xw, where: 
Un = Un (Uy, positive) and =0 (u, negative) 
W,=0 (u, positive) and= — u, (uv, negative) 


so that Sv, is the sum of the positive terms of Yu, with zero terms 
interspersed and Yw,, is similarly defined for the negative terms of 
du,,. Then 

LYu,=rv,-Xw, and Y|u,|=Xo,+ Lew, 


and both Xv, and dw, (series of non-negative terms) are convergent. 
Let Sv,,=S. Then the sum of any (finite) number of terms of dv, 18 
<S. Let v,’ be any re-arrangement of the order of terms in Σν,. 
Then the sum of any (finite) number of terms of Xv,’ is <S, since all 
the v,'’s are v,’8. Hence dv,’ is convergent to sum S’<N. Similarly, 
starting with Dv,’=S’ and re-arranging to Xv,=S, we have S<8". 
Hence, S =S’. Similarly for Sw, and any re-arrangement Lw,,’. So, 
for any re-arrangement Yu,’ of Lu,: 


Yu, =dXv,/ — Dw, =D, — DW, = LUn. Q.E.D. 


The important application of this Theorem is to establish the 
validity of the process of multiplication of series, provided that they 
are absolutely convergent: 


THEOREM: If Du, and dw, are absolutely convergent, then the product 
series 3 
UyVq + (UyVq + UW) + (UVg + UgVe τυ.) +... 


is absolutely convergent to sum Duy, X Dvn. 


472 SOME FORMAL DEVELOPMENT [15 
Proof: all possible pairings of terms can be set out in double array : 
UV, UzVg UUs 
Ugly Ugg Upgvs 
Ug, Ugg Ugls 


Put the whole set into a single series in two ways: 

(1) UyVy + (ων - ἀμ. 04) + (UyVg + τοῦς + 1040.) +... 
where the nth group is composed of terms in a diagonal of the array, 
starting from w,v, in the first row and running down from right to 
left. This is the product series. 

(11) 6401 + (UyVq + UqVq + 1040.) + (UV + Ugg + Ugg +UgVs +UgY;) +... 
where the nth group is composed of terms u,v,, plus all terms above 


this in the nth column of the array, plus all terms to its left in the 
nth row of the array. Then: 


first group =UzV 
second group) = = (4, + U_)(¥, +2) — U2, 
third group = (Uy + Uy + Ug) (Vy + Uy + Vs) — (Wy + Ue) (V1 +9) 


Sum of n groups =(u,+Ug+... + Un) (Vp +094... +0) 
LU, X LV, as Noo. 


Hence the series (ii) is absolutely convergent to Yu, x Xv,. By 
Dirichlet’s Theorem, (i) as a re-arrangement of (ii) is absolutely 
convergent to Yu, x Livy. Q.E.D. 

Consider a series Du,,(x), each term a function of x defined on some 
domain X. A necessary and sufficient condition that Yw,(z) is 
absolutely convergent, i.e. © | u,(x) | convergent, at each a of X is: 


Given ε, there is an integer N (x) for each x of X such that 


Σ | u(x) |<e for all m>n>N (a) «-««οννννννννννννον (1) 

s=n+1 
This follows from the Theorem of 11.3 since the sum shown, from the 
(n + 1)th to the mth term inclusive, is the difference between the sums 
to n and to m terms of the series > | u,(x) | of positive terms. In 
general, the choice of N in (1) must depend on the z taken; it cannot 
be assumed that the same N will serve for all x of X. If this happens 


6] SOME FORMAL DEVELOPMENT | 473 


to be so, the series is uniformly convergent as well as absolutely 
convergent: 


DEFINITION: The series Σιν, (1) is uniformly convergent over X tf 
it is absolutely convergent so that (1) holds for a constant NV. 


The following is one of the tests for uniform convergence: 


M Test: The series Su,(x) is uniformly convergent over X if a con- 
vergent series ΣΗ͂, of positive and constant terms exists so that 
| u,,(2) |<M,, for each n and for all x of X. 


Proof: Since ©M,, converges, given ε there is an integer N so that 


Σ M,<e for all m>n>N. Hence, 


s=n+1 
m m 
y lulz) |< Σ᾽ M,<e for all m>n>N. 
s=n+1 s=ntl1 
Here N is not dependent on z of X, i.e. Xu,(x) is uniformly conver- 
gent. Q.E.D. 


Apply these results to a power series. By (1), a,x" has radius of 
convergence r (absolutely convergent over —1r<z <r) if and only if: 

Given ε, then there is an integer N (x) for each x (—r<x<r) such 
that 


Σ Ι ας [[α [Ξε for all m>n>N (a) ....εονοννννοον (2) 
s=nt+1 . 


For uniform convergence, N(x) in (2) needs to be replaced by 
constant Ν. 


THEOREM: If da,2" has radius of convergence r, the series 18 uni- 
formly convergent over —h<a<h for any constant h <r. 


Proof: Since Za,7" is absolutely convergent over —r<x<r, the 
series is absolutely convergent for x=h. The series of positive con- 
stant terms Σ | a, | h" is convergent. But | a,7" |<| a, | δ" for all x 
in —h<a<h. Hence, by the M Test, a,x" is uniformly convergent 
over —h<av<h. Q.E.D. 

The significance of this result is that, within the radius of con- 
vergence, a power series is uniformly convergent; this implies that 
it is absolutely convergent and, hence, convergent — though not 


conversely. If we write the sum f(x) = > a,v” + R,(x), then F,(x2)—0 
s=0 


474 SOME FORMAL DEVELOPMENT | [15 
as n—>-0oo within the radius of convergence. So, given ¢, there is an 
integer NV such that, for each x within the radius of convergence: 
Ri(e)<e for all MN ou... ccccccececcecenes (3) 
For absolute convergence, (3) may hold only for N (x) depending on 
x; for uniform convergence, it holds for constant N. The theorem 


shows that (3) holds for constant N within the radius of convergence 
of the power series. 


15.7. Exponential and logarithmic functions. (Reference: Chapter 
12.) The logarithmic function, its inverse the exponential function, 
and the extension to a power function, can all be derived from a 
single integral, that of the algebraic function y=1/zr. In the domain 
5.» 0, the function y=1/x is continuous and decreasing. Hence, its 
integral exists in this domain. 


= du 
DEFINITION: log .-} ΤΕ for x>0. 
1 


Two basic properties follow: 
(1) log x is continuous and increasing, with derivative 


D log Z=->0 (x>0) 


(2) log (xy) =log x + log y; log (=) = —log x; log (=) =log x -log y; 
log (x") =n log x for integral n. 


Proof: (1) is a consequence of the Fundamental Theorem of the 
Calculus. The result for log (xy) in (2) is established by substituting 
u=yt (for given y) in the definition: 


du l dt 
|=) ere] 7 
As ¢ ranges from 1 to 2, u ranges from y to xy: 
[ du | dt. [ du | dt | du 
—=| — ie. | —=] —+] —. 
y Ὁ it 1 Ὁ it 1 Ὁ 
Hence log xy =log x +log y. 


As a particular case: log (x=) =log x +log (=) : 


7] SOME FORMAL DEVELOPMENT 475 


But | log (« -) =log1=0 by the definition. 
] 
Hence log (=) = —log 2. 


Then: log (} =log (« ἢ =log x + log () =log x — log y. 


Finally: log x? =log (x x x) =log x + log x=2 log x 
and loga"=nlogx byinduction. Q.E.D. 
Since the derivative of logx is everywhere positive (x>0), 


y =log x is an increasing function. Further log 1=0, so that log x <0 
for 0 «ὦ «1, logx=0 for x=1, and log x>0 for x>1. Moreover, 
ee | l 
since — ; decreases, D log «=- = also decreases. The curve y =log x is of 
the ἽΝ shown in Fig. 12. ᾿ above, the slope of the eer falling 
steadily as x increases. Finally: 
ι, 4 du 
DEFINITION: the constant 6 is such that a 1. 
1 

This implies that log e=1. 

A continuous and increasing function has an inverse which is also 
continuous and increasing. Hence: 

DEFINITION: y=expx ifx=logy for all x and for y>0. 
Two basic properties follow: 

(1) exp x is continuous with derivative D exp x =exp x>0 (all x) 

1 ° 

exp x’ 


(il) exp (v+y)=exp x x exp y; exp (—2)= 


exp (x -- y) oo 7 (exp x)" =exp nx for integral n. 


Proof: the inverse rule for derivatives establishes (i): 


D, exp x= =Y ΞΕ ΧΡ 2. 


D, log y ἘΔ] 7 

Properties (ii) follow from the corresponding properties (2) of the 

logarithmic function. 

_ The curve y=exp z is the same as that for y=log x, with axes 

interchanged. It is of the form shown in Fig. 12.3 above. Hence 
Q2 A.B.M, 


476 SOME FORMAL DEVELOPMENT [15 


y=exp x is an increasing function, the tangent having increasing 
slope; y<1 for x<0, y=1 at x=0, and y>1 for x>0. 
To relate to the constant e, such that log e=1, we have exp 1 =e. 
Hence: 
e"=—(exp 1)"=expn for integral n, by (ii). 
This can be extended to: 
e’= expr for rational r. 


The rest is a matter of notation: 


NotaTIon: the power e*=exp x for all real x. 


If x is rational, then e* is the power of e in the sense of elementary 
algebra, if x is not rational, e* is simply taken as exp x. The properties 
(11) are then: 


x 
exty —e%ev: ρ- — 1, et-¥ ee . (er)y —ery 
’ ex 9 ον ? 


The expansion of the exponential function follows from Taylor’s 
series: 
a 
5] Ὥς 5 oe as 
which is absolutely and uniformly convergent for all x. The ex- 
pansion of the logarithmic function log (1+) is derived from the 
geometric series by integration term by term: 


1 
7 ~¢+u2-984+...4(-—1)*-lat1 ει, 


e*—expxr=l pee 


xr v3 χά | on 
=< — | )"~-1 __ 
log (1+2)=z se 7 .1| 1) πὶ 


These are absolutely and uniformly convergent for | x | <1. 
As a particular case of the exponential expansion for x=1: 


: 1.1 1 1 1 
e=l+lt+stat.. tit. 
The rational approximation to e=2-71828... is derived from this 
series, by taking a sufficient number of terms. 


15.8. Circular functions. (Reference: Chapter 12.) Another single 
integral provides all the circular functions. It is again the integral 


8] SOME FORMAL DEVELOPMENT 477 


of a simple algebraic function y= continuous and increasing 


1 +2’ 
for x <0, decreasing for x>0. 


[ 
DEFINITION: tan} x= Re for all x. 
0 1+? 


The basic properties are: 


>0 


1) tan-! x is continuous and increasing, with D tan-! 7= 
ὅ 1+2? 


all x. 

w+y 

Ι -- αἷμ᾽ 

Proof: (1) follows from the Fundamental Theorem of the Calculus. 


ω τ in the definition. Here, 


(2) tan-? 7+tan-! y=tan“ 


: : _t-y _ 
To establish (2), write OF πγὶ or t= τς 


πε} 


μη Zz 
tanta =[" τεσ], ΒΞ Ὁ μ' αἱ 


as u runs from 0 to x, ¢ runs from y to z where z= So: 


1+u? l+u 
where “= τι uw = ἐς περὶ τυ Ὁ). τς 
and l+wv= 1+f—u = ee 
Hence: tan-1 24 “| reraiers τ τῆς at i= | oa 
| =| < - | poten z—tan-! y. 
So: tan“! 2+tan-! y=tan-! z=tan oe Q.E.D. 


1 du 
DEFINITION: the constant a is such that ἐπ =| leu 
0 
This implies that tan-! 1=37. The range of y=tan- z is still to be 
1. oe ‘ 
found. It comes by substituting w =; in the definition, ¢ going from 


1 to 0 as wu increases from 1: 


> du [Ὁ 1 1) a=( 5.5 τῇ 
\ ᾿ ear ie) ~ jo 1: Jo 1 +e 


478 SOME FORMAL DEVELOPMENT [15 


0) 1 a) 1] 
So: | du | du + du 2| du =e 


>» l+u |, 1+), l+u® “|, laut 


Hence, y=tan-!2—>}r as x00. Similarly, ψ-- -- ἔπ as x->- o0. 
From the definition, property (1) and these results, it follows that 
y=tan-!z is an increasing function, that tan-17%<0 for 2<0, 
tan-! 2=0 at x=0 and tan“! z>0 for x>0, and that tan-!'4#>+447 
as x—> + οὐ. The curve y=tan-' z is of the form shown in Fig. 15.8a. 
A continuous and increasing func- 
tion has an inverse which is also 
continuous and increasing. Hence: 


DEFINITION: y=tan x if x= tan-ly 
for x wm the open interval --ἱπ 
«ὦ <hr. 
The range of y=tan z is all y, and 
Y>+0 as 5-- Εἶπ. 
The basic properties are: 

(i) tan z is continuous and in- 
creasing, with 
Dtanz =1+tan*x>0( -- ἔπ <a <}n) 


(ii) Addition Formula: 
tan (7+y)= 


tan x+tan y 
Ι —tan x tan y 


Proof: the inverse function rule gives (i) 


D, tan x= =l+y?=1+ tan? x. 


1 
D, tan“ y 
(ii) follows from the corresponding result (2) for tan-! x. 
The curve y=tanz on the domain ~4ia7<2z<r is that of 
y=tan—!z with axes interchanged. It is continuous and increasing, 


as shown in Fig. 15.80. 
Two other circular functions follow: 


and sin x= where 


1 t 
J1+¢ V1+2 
t=tan τί —407 <x <4). 
The basic properties are: 


| DEFINITION: cos += 


; sin? «+ cos? x=1 


sin x 
(a) tan x= 
COS x 


8] SOME FORMAL DEVELOPMENT 479 


(Ὁ) cos x increases ( -- ἐπ <x <0) | y=tan x 
and then decreases (0 <x <47) 
with derivative 


D cosx= -sinz; 


sin x increases ( — ἐπ <x <47) 
and D sin 2 =cos 2. 


(c) Addition Formulae: 

cos (%+¥y)=CcOs x cos y — sin x sin y 
sin (x+y) =sin x cos y+ cos x sin y. 
Proof: (a) follows from the de- 
finition. The derivatives of (b) are 
obtained by the function of a 
function rule with ¢=tanz and 
DJf=1+tan? x=1+#: 


1). cos x - ΕΒ, 


ο ΜΙ 
2t 
=| τα ξβμα [τ] 
a t — 1 
a ; [Te — sin x Fia. 15.86 


and similarly for D, sin x. The signs of cos x and sin x come from the 
definition and the signs of tan x: cos 2>0 (-- ἐπ <a <i), sin x <0 
(-- ἐπ «« «-() and sinz>0 (0 «ὦ «--ξπ). The signs of Dcosz and 
D sin x follow and so the increasing/decreasing properties as stated. 
The addition formulae (c) are derived from (ii) for tan zx: 


1 (1 —tan 2 tan y)?* 


os? (x = rT ---Ξ---------------------ς-----ς-.-- 

a 1+tan? (x+y) (1-—tan x tan y)?+(tan x+tan y)? 
" (1 —tan x tan ψ)3 _ (1 - tan αὶ tan y)? 
— *1+tan?a+tan?y+tan? αὶ tan?y (1+tan?x)(1+tan? y) 


sin x sin 
Put tan x =—— and tan y= J and use 
— COS COS ¥ 


sin? x + cos? x =sin? y + cos? y=1: 


(cos x cos y — sin x sin y)? ᾿ ᾿ 
cos (« Ἐψ}ὲΞ- [-- ------ ο΄ 5’ = cos cosy -- sinxsiny, 
(sin? x + cos? 2) (sin? y + cos? y) 


480 SOME FORMAL DEVELOPMENT [15 


The other addition formula follows similarly. 
Both cosx and sinzx can be defined on the closed interval 


1 
—-da<au<in. As x->14a and t=tanzx—>o, cosx= —0 and 
4. : J1+? 
t ΜΝ ᾿ 
sin « ------- - -ἱ, Similarly, as x->- ἔπ, cosx—0 and sinz—>-1. 


V1+0 

So, write cos (+ 47)=0 and sin (477) =1, sin (— 47)= -- 1. From the 
definition, cos 0=1 and sin0=0. Hence the curves y=cosx and 
y =sin 2 are as shown in Fig. 15.86 for - ἐπε χε ἐπ. 

To extend the domain to all x is a matter of convention: 

NOTATION: cos (ὦ +17) = —cos x; sin (c+m)= —Sin2z; 

tan (ἡ +7) =tan x. 

Hence, as shown in 12.5, the functions y=cosz and y=sinz are 
defined for all x and have period 27. The function y =tan x is defined 


for all x(x aia a, for integral n) and it has period z. 


The expansion of the inverse tangent is obtained by integration 
term by term of the geometric series: 


1 
pgs ee ee ee 
“13 xh a 1 
-1 Ng -- 1)» 
tan-ly=-2 lar ee at «τι ΤΣ ΣΝ 


absolutely and uniformly convergent for | x |<1. The expansions of 
the cosine and sine functions are given by ~— Series : 


a2 94 xt 


1 
cos x= 1 -- a1 tats +(- oan mh 
x3 χῦ xn : an n+1 
sin £=2 — 31 tera te tt On eit © 


absolutely and uniformly convergent for all x. 


15.9. Linear algebra. (Reference: Chapter 13.) The general concept 
of a vector space, 15.3(v) above, is developed: 

(i) Vector Space V over the Field 1. Write v for a vector of V and a 
for a scalar from F, denoting particular vectors and scalars by 
subscripts. The basic construction is that of linear dependence: 


9] SOME FORMAL DEVELOPMENT 481 


DEFINITION: The set {v, Vo, ... Um} 18 linearly dependent 1} there is a 


m 
Set {A, As, ... Am} of scalars, not all zero, so that Sa,v,=0. 
f=] 


It follows that, if {v, v,, v2, ... v,} are linearly dependent and such 

that a #0 in the set of corresponding scalars {a, a,, de, ... @,}, then 
v can be expressed as a linear combination of the set {v,, V2, ... Um}: 

™ 

v= ) λιῦ, for some scalars d,, As, ... Am: 


r=1 
It is only necessary to identify A, with a,/a (a #0), and some of the 
’s can be zero.* It also follows that a set {v,, ve, ... Un} Of m vectors is 
linearly independent if no set {a,,a,...@,} Of scalars exists for 


m 
x a,v,#0, the case a, =a,=... =a,,=0 being excluded. Hence, if the 
==] 


m 
m vectors are linearly independent, then > a,v,#0, for any scalars 
r=1 


not all zero. No one vector of a linearly independent set is a linear 
combination of the others. 


DEFINITION: The set S,, = 01, Vo, ... Um} Spans the vector space V if 
every vector of V 18 a linear combination of S,: 


™m 
v= δ᾽ A,v, for every v and for some scalars λι, Ag, ... Am: 
r=1 

If there is no such set S,,, V is said to be of infinite dimension. Other- 
wise, from the several (at least one) integers m such that a set S,,, 
spans V, pick out the smallest integer n. V is then said to have 
dimension ἡ. If V is of dimension n, no set of fewer than n vectors 
spans V but there is at least one set of n vectors which does. 

The main property of a vector space V of dimension n is: 

THEOREM: If S,= {v4, Vo, ... Vz} 18 any set of vectors of V of dimen- 
sion nN: 


(i) k<n implies that S, cannot span V 
(11) k>n wmplies that δ, cannot be linearly independent 
(iii) kK=n implies that S, 18s linearly independent when tt spans V and 
S, spans V when it 18 linearly independent. 


* Indeed all the )’s can be zero, so that the zero vector v =0 is a linear combination 
of any set of non-zero vectors, a rather trivial case. 


482 SOME FORMAL DEVELOPMENT [15 


Here (i) follows from the definition and (iii) is a consequence of the 
others; (ii) is the case which requires proof. We first prove the 
following proposition. Suppose that every vector of a set 


Sy ={01, Va, «++ Ue} 
of k linearly independent vectors is a linear combination of a set 
{U1, Us, ... Un} Of n vectors: 


ἢ; 55 Σ r,,u, for some A,, ((=1, 2,...k; r=1, 2, ... 2) ...(1) 
r=1 


The proposition is that k<n. The proof is by mathematical induction 
on k. The proposition is obvious for k =1; if it holds for ἢ — 1, it must 
then be shown to hold for k. The induction hypothesis is: a set of 
k —1 linearly independent vectors is a linear combination of a set of 
n vectors and then k —-1<n, whatever n may be. There are two possi- 
bilities to consider. First, it may be that A,, =A, =... =Agn=0 80 
that wu, does not appear in (1). Hence each vector of S, is a linear 
combination of n—1 vectors. Drop one of 5 to get a set of k-1 
linearly independent vectors, each a linear combination of n-1 
vectors. By the induction hypothesis, k -1<n-1, 1.6. k<n and the 
_ Induction is complete. Second, if at least one of these )’s is not zero, 
we can take it (by suitable ordering of vectors in 3.) as Ay, #9. Drop 
v, from 3; to get the linearly independent set S,_,={v,, V9, ... Vz-1} 
of k-—1 vectors. Write: | 


and assemble into a set S,_,’={w, We, ... W,-1}. Substitute (1) into 
(2) and notice that wu, disappears, leaving w,; as a linear combination 
Of {u,, Ua, ... Un}. Further, since the v’s are linearly independent, 
(2) ensures that the w’s are also. Hence the set S,,_,’ has ἢ — 1 linearly 
independent vectors, each a linear combination of n — 1 vectors. The 
induction hypothesis gives k -1<n —1, 1.6. k<n. Again the induction 
is complete. The proposition is established. 


Proof of (ii): V is of dimension n and a set {w,, Uo, ... Up} Of n Vectors 
exists so that every v of V is a linear combination of them. If 
δὲ = {V1, Ve, ... V,} is linearly independent, and (like all vectors) each 
is a linear combination of the n u’s, then by our proposition k<n. 
Hence, if k>n, S, is not linearly independent. Q.E.D. 


97 SOME FORMAL DEVELOPMENT 483 


From the theorem, we can say of a vector space V of dimension n 
that fewer than n vectors may be linearly independent but cannot 
span V; more than n vectors may span V but cannot be linearly 
independent. Both properties can hold only for a set of precisely n 
vectors. Hence: 


DeEFINITION: A basis for V of dimension n 18 a set S;, = ἴυ1, Va, «+» Uns 
of n vectors, both linearly independent and spanning V. 
A basis always exists; but it need not be unique. Usually there are 
many possible bases for V, all of precisely n vectors. To summarise: 


A vector space V over F of dimension 7 has at least one linearly 
independent set S,, ={v,, Vs, ... Un} as a basis. 8, spans V: 
n 
v= Y λιῦ, for some scalars λιν, Ag, ... An 
f= | 


S,, is at the same time the largest of all linearly independent sets of 
V and the smallest of all sets spanning V. 


Notice, in particular, that any set of »+1 vectors must be linearly 
dependent and one of them is a linear combination of the others; 
a set of ἢ —1 vectors cannot span V and all vectors of V cannot be 
expressed as linear combinations of them. 

(ii) Space V,(F) of n-tuples. Write v =(x,, Xe, ... Zz) aS an n-tuple 
of elements taken from the field F of scalars. Define vector addition: 
if v=(2,, 22, ... ἃ,) and v’ Ξ-(ακ΄, “ς΄, ... Z,'), then 

V+! =(%1 ΔΝ ἡ Pot+He’, ... Lp tUy’). 
Then the v’s are an additive group, the zero vector being 
0 =(0, 0, ... 0). 
Take a scalar a from F and define the scalar product: 
if v=(2,, Lo, ... Z,), then av =(ax,, AX,, ... AX,). 

Then the v’s form a vector space over Κ΄, denoted V,,(F). A basis for 
V,,(F), and hence its dimension, are easily found. Write the vectors: 
<,=(1, 0, 0, ... 0); e.-=(0, 1, 0, ... 0); ... eg =(0, 0, 0, ... 1) 

where e, has 1 in the rth place and 0’s elsewhere. Then: 


THEOREM: The set {e, ε5, ... €n} 18 @ basis for the vector space V,( F) 
of n-twples and V,(F) has dimension n. 


484 SOME FORMAL DEVELOPMENT [15 
Proof: by the rules for sums and scalar products of n-tuples: 
(1, Xe, ... Xn) =(Xy, 0, ... 0) +(0, Lg, ... O) +... +(0, 0, ... Xn) 


=2,(1, 0, ... 0)+2%,(0, 1, ... 0) +... +2%(0, 0, ... 1) 
= Vy€y +Uo€o+... + ZXn€En 


i.e. the set €,, €,, ... ἐῃ spans V,(F). Again: 
n 
>, 4 €,=4,(1, 0, ... 0) +a2(0, 1, ... 0) +... +a,(0, 0, ... 1) 
r=1 


= (ay, 0, ... 0) + (0, ae, ... 0) +... - (0, 0, ... a) 
= (Qj, A, ... Gn) #0 (unless all a,=0) 


1.6. the set ¢,, €9, ... €, is linearly independent. It is a basis for Ρν,( 
and, since it has n elements, V,,(/’) is of dimension n. Q.E.D. 


This is not the only basis for V,(¥). There are many others, e.g. 
v,=(1, 0, 0, ... 0); ve=(1, 1, 0, ... 0); ... vp(1, 1, 1, ... 2). 
The importance of V,,(#') stems from the result: 


THEOREM: Any vector space V over F of dimension n is isomorphic 
with V,(F), the isomorphism preserving sums and scalar products. 


Proof: a basis for V is some set S,,={v,, Vg, ... Un} and any vector 


[ 
v= Σ λιῦ, for some scalars λι, As, ... A, from F. These scalars are 
r=1 


n n 
unique. For, otherwise, let v= Σ p,v, also and > (A,—p,)v,=v —v=0 
f=] r=) 


1.6. A,=y, all r. Write X=(A;, Ag, ... An), @ unique n-tuple of V,(F). 
Hence, to v of V, there corresponds a unique A of V,,(F). Conversely, 


given A=(A,, Ay, ... A,) of V,,(F), then > A,v, is a vector of V by the 
ful 


sum and scalar product rules. There is a one-one mapping, vA, of 
V onto V,(F). The mapping preserves sums: if vA and v’«+)’, then 


n 


vtu= D(A,+A,')v, and A+N =A +A, AgtAg’, --.AntAn’), ie. 


r=1 
v+v'rA+X2'. It preserves scalar products: if vA and a is a scalar, 
then av= S(ad,)v, and αλ--(αλ;, αλι, ... αλ,), ie. avead. The 


r=1 
mapping is an isomorphism. Q.E.D. 
In summarising, we allow first for the possibility that a vector 
space Κ᾽ is of infinite dimension. A case in point is the set 712] of 


97 | SOME FORMAL DEVELOPMENT 485 


polynomials over a field F, a ring (integral domain) under the opera- 
tions of + and x. Using the same field F for scalars, we define scalar 
products by (6) of 15.2 and we find that 2] is a set of vectors 
v=(fo fifo: ...) over F. Here v is similar to the n-tuple (4, 2, ... Tn) 
except that there is no fixed n. Hence, the integral domain 75] is 
also a vector space over F of infinite dimension. 

Otherwise, if V is of dimension n, the last theorem shows that V 
can be replaced for all algebraic purposes (up to isomorphism) by 
the space V,,(F) of n-tuples. Hence, any vector space of finite dimen- 
sion n can be effectively described solely by the integer n and the 
field F. For F provides both the scalars for scalar products and the 
constituents of the n-tuples of V,,(F) equivalent to V. But V,(/) is 
still a wide concept. It can be specialised by taking F as the field of 
real numbers and by adding definitions of length and angle. V,(/’) 
is then a space in the geometric sense. If length and angle are defined 
as in 8.4, V,,(F') becomes the Euclidean space £,(). 

We have stressed that a ring (or field) and a vector space are two 
different systems of double composition. But, though different, they 
are not exclusive: a set S may be both a ring (or field) and a vector 
space. This is so if three operations, subject to the appropriate rules, 
are defined in S: sums, products and scalar products. An outside 
field F, different from S, is needed for scalars. Again, this does not 
rule out the possibility that F is a part of S. An example makes all 
this clear. The set C of complex numbers, z=x+iy=number pair 
(x, y), is a field under + and x. Scalar multiplication of number 
pairs by real scalars a can be defined: a(x, y)=(aa, ay), giving the 
complex number (ax)+i(ay). Hence C is also a vector space, of 
dimension 2, over the field of real numbers. 

(iii) Linear Transformations. A transformation, in general, is a 
mapping of one set into another. A linear transformation is the 
particular case of a mapping of one vector space V into another Κ΄, 
preserving the operations of addition and scalar multiplication. 
This is the same as saying that linear combinations of vectors are 
carried over from V to V’ in the mapping. Hence the general and 
abstract concept: 


DEFINITION: If V = {v,, Ve, ...} and γ' ={vy', ve’, ...} are two vector 
spaces over the same field F = {),, Ao, ...}, a linear transformation of V 


486 SOME FORMAL DEVELOPMENT [15 


into γ΄ 18 a mapping which carries a linear combination of vectors of V 
into the same linear combination of vectors of V’: 


k k 
» λ,υ,-- 2 r,v, of v,->0,' (r=1, 2, ... k). 


The mapping may be one-one but generally it is many-one. 

To bring the concept down to earth, take the case commonly used : 
two spaces V,,(/) and V,,(F) over the field F of real numbers. Write 
χ᾽ Ξε (αι, %,... 2.) as an n-tuple of V,(F) and y=(y;, ys, ... Ym) a8 an 
m-tuple of V,,(F). The linear transformation is then: xy. 

T 


As a basis for V,,(F) take {e,, ε5» ... ¢n} where ες is the n-tuple with 
1 in the sth place and 0’s elsewhere. As a basis for V,,(F) take 
{71; Na +++» Nm} Where ἡ, is the m-tuple with 1 in the rth place and 
0’s elsewhere. Then: 


n m 
B= Veg AN Y= Vi ψρηχιννννννννννννννννννννννν (3) 
s=1 r=1 


Under 7, each of €,, €5, ... €, a8 a vector of V,.(f) has an image in 
V,.(f), a certain m-tuple. With s=1, 2, ... n, write them: 
Es —>(Ay5, Qos; eee Ams) = 44571 or Hose Ἢ... Ὁ Ams) γι" 


There are m x n scalars a,,, for r=1, 2, ... m and s=1, 2,... n. They 
are fixed by the specification of the linear transformation 7’. So: 


m 
ec) Gane Bay δι ον Mee Bees (4) 
r=1 


Let x and y correspond under 7’. By (3) and (4): 


deere ΣΟ Σ φαρηντο Σ (¥ ayets)ny 


8-:-1 γ--} r= s=1 
n m 
But Σ᾽ ες DY Yr. 
s=1 r=] 
᾿ 
So: Yom) ete. PVD: escent age Vuns (5) 
s=1 


The linear transformation T reduces to the m equations (5) and these 
determine y,, Y,... Ym When 24, %,...%, are given. Hence, the 
linear iranaformation is completely specified by the set of mxn 
scalars a,,, 1.6. by the matrix A=|| a,, ||. 


9] SOME FORMAL DEVELOPMENT 487 


THEOREM: A linear transformation T of V,(F) into V,,(F) is 
described by a matrix A=|| a,, || of order mxn and T is: 


n 


Yr= )) Gye, 1=1,2,...m 


s=1 

where T carries (x1, Xa, ... Xn) Of Υ,(1) into (ψ,, Yo, ... Ym) Of Vin( F). 
To determine A for a given transformation 7’, it is only necessary to 
find what.m-tuples in V,,(F') correspond to the basis €,, ες, ... €, of 
V,(f). The constituents of the m-tuple corresponding to ¢, are the 
first column of A, and so on. In the simplest case (as in 7.5) n =m =2 
and (5) are: 

Yy Ξε χη +Aye%o ANd ψ4 Ξε... + ἀφο. 
There is, however, no need that =m in a linear transformation. 

(iv) Matrices and Rank. The set of all n-tuples from a field F is a 
vector space V,,(/) of dimension n. A sub-set of n-tuples can still 
be a vector space in its own right, of dimension less than n. For 
example, the set of points in three dimensions forms a vector space 
V;(f) while the subset of points lying on a plane is a vector space 
V(f). Apply this idea to rows and columns of a matrix A=|| a,, || 
of order m x n with elements from the field of real numbers. 

The rows of A are n-tuples 0, Ξε (α,1, Ura, .-- Gen), T=1, 2,...m. A 
vector space V is got by adding all linear combinations of v,’s. Let 
V be of dimension p. Then p<n where p=n only when V comprises 
all -tuples of real numbers and where p<n otherwise. Further, 
p<m where p=m only when the »v,’s are linearly independent and 
p<m otherwise. Hence p<smaller of m, n. Here p is the row rank of 
the matrix A. 

Alternatively, the columns of A are m-tuples τὺ, τε (ας, ας.» «++ ἀφ)» 
s=1, 2, ... ». Suppose the vector space W, got from all linear com- 
binations of the w,’s, has dimension w. Again, w<smaller of m,n. 
Here w is the column rank of A. The basic result is: 


THEOREM: A matrix A of order m x n has the same row and column 
rank (p=w), with p linearly independent and m — p linearly dependent 
rows, with p linearly independent and n — p linearly dependent columns. 
The rank of A is p and p<m, p<n. Proof: 

Suppose p<w and arrange A so that the first p rows and the first w 
columns are linearly independent. From the n-tuple 


Vp = (Gri, Arg, «++ Brn) 


488 SOME FORMAL DEVELOPMENT [15 
form the w-tuple Ὁ, = (Bry) ἄγαν «++ Bry): 
p 
But: στε δ᾽ λιῦ, (HHL, 2, 0M) vecerecereeeeeeecenes (6) 
r=] 


for some )’s not all zero, expressing the linear dependence on the first 
p rows. Since (6) implies the same relations for the ‘smaller’ w-tuples: 


p 
Oy. = bs λικῦν, (k=1, 2, eee m) ΠΥ (7) 
γ-} 
Write: 3X + ἀγχοῦ. +... $A Ly =9 
Ao hy + Ags + eee + BouwXey — 0 (8) 


Api %4 + ApnoXs + ees + Apa uy - 0 
as p equations in w variables x1, 22, ... ζω. Given p<w and from 13.8 
above, 2’s not all zero exist as a solution of (8), a non-zero value of 
(at least) one of the x’s being assigned in advance. From (7) and (8): 


p ρ : 
Ay V4 Ἔ Apghs +...+ ἀχκωΐω = ( ΝΣ Arad) vy + ( ys reales) Lo +... 
r=1 r= 
ρ 
+ ( Σ, λα.) ζω 
T= 


p 
= Σ᾽ λικία, αν + Opa +... + Apu %y) =0 
r=1 


for some x’s not all zero and for any k=1, 2, ... m. Hence (8) extends 
to a full set of m equations in the x’s. Consequently, the first ὦ 
columns of A are linearly dependent, the multiples being the w’s. 
But w is defined so that the first w columns of A are linearly in- 
dependent. This is a contradiction. Hence p<w. Similarly, w<«p. 
Hence p=w. The rest of the theorem follows at once. Q.E.D. 


APPENDIX 


FORMULAE OF ELEMENTARY 
ALGEBRA AND TRIGONOMETRY 


No detailed knowledge of elementary mathematics is assumed in this 
text but certain simple results are needed from time to time. The 
formulae are developed in this Appendix. Where proofs are not given, 
they are to be found in the standard school texts. 


A.1. Powers and exponents. Let a be a positive real number and n a 
positive integer. Then a” is a notation for axax...xa (n times). 
From the definition, the following properties follow immediately: 


a™a" =qmin : (a™)" =qmn. (ab)" —a"hr 


for any positive real numbers a and ὃ and any positive integers m and 
n. An extension of the notation permits the expression αὐ to be written 
for any positive real number a and any rational x. For positive frac- 
tional x, a* is defined as a root of a; when z is negative, αὐ is defined 
as a reciprocal. As special cases, αἱ is taken as a and αϑ stands for 
unity*. The complete notation can be spelled out: 


Notation: The expression αὐ is defined for a positive real number a 
and various rational values of x: 


x=n, where n 18 a positive integer: 
a"=axax...xa_ (n times) 
t=0; -a°=1 


* It may be said that a° must be 1 since aa” —a™+" with m =0 gives αϑαῦ —a°t” -- οὖν 
so that a°=1. Similarly it may be said (e.g.) that at must be να, since (a™)" —qmn 
with m =4 and n =2 gives (a+)? =at* =a so that at =,/a. This is a misconception. We 
are at liberty to define a° and at in any way we like; there is no ‘must’ about it. What 
we choose to do is to take a®=1 to preserve the rule aa" =a™+" and a} =,/a to 
preserve the rule (a”)" =a™", We could choose otherwise, e.g. 2° =0 and at =1/a?, 
but it would be wasteful. Generalisation in mathematics aims at preserving, rather 
than scrapping, existing rules. 


490 APPENDIX | [A 


x=plq, where p and q are positive integers : 
αν» — %/a? = positive qth root of a” 
ατι -ῦ, wherer is a positive rational: 
1 : 
ἄτης =recvprocal of a’. 
Here a? is the xth power of a and x is the exponent of the power. 
The limitation a>0 can be relaxed for some but not for all ex- 
ponents. It is in order to write αὐ for a <0 when z is integral, but not 
1 
axa 
are valid for negative a. On the other hand a1/? = ,/a has no meaning 
(within the domain of real numbers) if a is negative, though αἷ18- ἦα 
still holds. It is correct, moreover, to write a*=0 when a=0, except 


always when z is fractional. Powers like a3=a xaxa and a“?= 


when z is a negative rational. For example, 0! =,/0 =0 but 0-12 = Jo 


has no meaning. 

The properties given above for positive integral exponents remain 
true for any rational exponents. The notation is designed to achieve 
this: 

| αν =a"ty: (a*)¥ =a"; (δ HAD... ceeereccsess (1) 
for any positive real numbers a and ὦ and for any rational x and y. 
Various applications of the properties (1) are involved in the 
examples: 


Jas,/a> —- 93/25 /2 — g3/2+5/2 — αβ 


(ab)? ab? — 58(a3q-5) =b%a3-5 = b8q-2 ΒΕ 
a? 


αὖ κα 
3 
a S/a 


A.2. Logarithms. A logarithm is the inverse of a power. If a is a 
positive real number, then so is a” for any rational y (positive, 
negative or zero). | | 

Notation: If «=a, then y=log, x, read ‘logarithm of x to base a’, 
defined for suitable positive real values of x. | 


2] ΟΠ APPENDIX 491. 


Since a logarithm is, by definition, an exponent, the properties (1) 
of powers can be used to derive properties of logarithms: 


log, (xy) =log, x +log, y; log, (x*) =b log, x; log, x=log, ὃ log, x (2) 


The last of (2) implies that the logarithms of various z to one base 
a are simply re-scalings of the corresponding logarithms to a second 
base ὃ. Given a set of logarithms to the base b, the corresponding set 
of logarithms to the base a are to be written by multiplying through 
by the constant factor log, b. It is for this reason that the particular 
base chosen for logarithms is of little importance; any convenient 
base will do. Common logarithms as used in arithmetical work are 
logarithms to the base 10, a very convenient base. So y=log,, x 
implies simply: x =10". 

There is a difficulty here: what do we mean by ‘suitable’ x in the 
definition of the logarithm of σῇ Since y=log, means z=a" and 
since exponents are (so far) limited to rationals, it follows that only 
rational logarithms can be written as yet. So log,x=y must be 
rational and x must be a real number which is a rational power: 
x =a". This is a very severe limitation. 

Consider common logarithms and drop for convenience the refer- 
ence to the base 10. Then log x =y means x= 10" only for rational y. 
Hence x must be of the form: x=10?/*= 3/10? (p and q integers, 
q>0). So 0-01=10-* has logarithm log 0-01= -- 2, and /10=10%5 
has logarithm log ./10=0-5. But, in fact, most real numbers x have 
no (rational) logarithm. For example, take the positive integers 
greater than 1. The only integers among 2, 3, 4, 5, ... with (rational) 
logarithms are multiples of 10. The integers 10, 100, 1000, ... have 
logarithms 1, 2, 3, ...; any other integer & has no (rational) logarithm. 
To see this: suppose log k=p/q (rational, p and q positive integers) 
so that &=10? = ¢/10? and k*=10? =number with 0 as last digit in 
the decimal system. This is impossible; no integral power of k can 
have 0 in the last digit. Powers of 2 end in 2, 4, 6 or 8, powers of 3 in 
1, 3, 7 or 9, and so on. Hence log & cannot be rational. 

There is no (rational) log x for values of x as simple as 2 or 3. Nor 
are we able, as yet, to wriggle out by saying that log 2 or log 3 is 
irrational; we have no definition of irrational powers of 10. We are in 
an awkward situation; we cannot even justify the use of tables of 
logarithms in ordinary arithmetic. The tables show for example: 


402 APPENDIX [A 


log 2=0-30103 .... This is not a rational number (no rational power 
of 10 gives 2). We are not entitled to take it as an irrational number 
(no irrational powers of 10 are defined). The missing link is the defini- 
tion of irrational powers and it can be supplied — in Chapter 12 after 
much water has passed under the bridges. It turns out to be true that 
log 2 is irrational, approximated by the rational 0-30103 to five 
decimal places. But this involves a surprisingly sophisticated concept. 


A.3. Roots of polynomial equations. We are given a polynomial 
equation with real coefficients and of degree n, where n is a positive 
integer. We wish to solve it in the sense of finding values for all the 
roots. Moreover, we require a solution which is a general process or 
formula, applicable to all equations of given degree n, a process 
which involves only the operations of addition, subtraction, multi- 
plication and division, together with root extraction (square roots, 
cube roots, and so on). The position finally established by the work 
of Galois (1811-32) is a very remarkable one. There are general 
processes for n =2, 3 and 4, a simple formula for the quadratic, much 
more difficult processes for cubics and quartics — but it is just not 
possible to solve all polynomial equations of degree n> 5. 

The general process for a quadratic is that of ‘completing the 
square’: 


Given: ax*+b%+c=0 (a0) 
b δλϑ3 b2 
τῶν 3 et a era 
write: αὐ ἐν 5. {5} i Cc 
; ( Ἄς Ὁ 
1.6. ὙΠ τς ae et oe 


b 
Make the transformation to another variable y: #=y — 30° Then: 


b?-4ac . 1. ~_—— 
y?= a 1.6. y= + 5, νδ5 -- 4αο. 
] a 
So: «τς (-b /i*— 4a0}) shen Wagner eeen nwa esis (3) 


The general formula (3) applies to any quadratic az* + bz +c=0 with 
real coefficients a, ὃ and 6. The only requirement (a40) is needed to 
ensure that the polynomial is indeed a quadratic. If b?>4ac, the two 


3] APPENDIX 493 


roots (3) are real and distinct (rational if b? -- 4ac is a perfect square); 
if 6b? = 4ac, there is a double root which must be rational ( - =) . On 


the other hand, if b?<4ac, then the two roots (3) are conjugate 
complex. 

To illustrate that a general but complicated process can be ob- 
tained for the solution of a cubic or quartic equation, consider the 
following development for the cubic. It involves more than one 
transformation: The first step is that of ‘completing the cube’, to 
get rid of x?: 

Given: ax? +bx*+cx+d=0 (a0) 
divide through by a and arrange: 


43 atsa(s)e+(5) - (τ -)εε( ὦ, ἢ 
3a 3a 3a/ \8a2 a 27a a 
» re oe re ee ee pe 
ἰὼ ( + an) = (5 a 3a/ 3a\3a? a 27a a) 


: b ᾿ 
Make the transformation x=y -- 3, 8° that the equation becomes: 


yr=ay +B 
δ ς be 208. ἃ 
where rr and B=2 O70) a" 


The next step is to get rid of the term in y. Make the transformation 


Yy=2 +e (z40) as originally devised by Vieta (1540-1603): 
ὃ α 
(2 Ἐπ) =a(z +e] +B 
a? 
giving: 2 —B +574 =0. 


Finally, multiply through by z° and transform by w=z?: 
a? 


97 Ὁ 


w? — Bw + 


4e3 
This can be solved: w= (βε, (ρ' -Ξ) ᾿ 


Jobbing backwards, we obtain z, y and then z in succession. For each 
of the two values of w, we write the three cube roots (as in 3.8) to get 


494 APPENDIX [A 
z. For each of these, we get y =z + τ; and finally 2 =y -- a The result 
is 2x 3=6 values of x but these are found to be equal in pairs. The 
outcome is a general process for obtaining the three roots of the origi- 
nal cubic; two of the roots may be conjugate complex. The tricky 
step is getting the three cube roots of w to provide z. 


A.4. Solution of two linear equations. As an extension, we can attempt 
to solve two polynomial equations, each involving two variables. 
The simplest case is that of linear equations: 


a,x+byt+e,=0 and a,x+by+c,=0 
where all six coefficients have real values and where the variables x 


and y are real. Suppose that a,b, —a,.b,40. If b,~0, eliminate y by 
use of the second equation: 


| aya εὐ ~ =) +¢,=0 
2 
. ὩΣ DiC τ boc, un Qs Co .Ν C425 = CoM, 
sit a Ab, — Ab, ὧδ ὦ by ὃ, Ayby — dab, 
x 1 
So: Y (a,b, — gb, 40) ......(4) 


b1Co = boc, C15 — CoA, a,b, as Ad, 


On the other hand, if 6,=0, then 6,40 to ensure that a,b, — a,b, #0. 
In this case, y can be eliminated by use of the first equation and the 
same result (4) follows. Hence, the equations have a unique solution 
(4), provided that a,b, —a.b, #9. 

This is the main case. The difficulty is that there are other cases, 
which may be called degenerate cases, arising when a,b, —a,b,=0. 

; x y 1 

Write (4) as ABO 
C=a,b,—a,b,. The main case has C#0 and it does not matter 


where A=b,c,—6,c,, B=c,a,—c,a, and 


whether either (or both) of A and B is zero. If A =0, then z =f =0; 


if B=0, then y - =0. There are two degenerate cases, with C=0. 


(i) C=0 and either 440 or BO (or both). 


Speaking roughly, we say that (4) gives x = or y - (or both), 1.6. 


5]. | APPENDIX 495 


that x or y is ‘infinite’. More strictly: a,=Aa,, b,=Ab, but c,4Ac, for 
some multiple A. The equations are inconsistent, shown graphically 
by parallel lines. 
(ii) A= B=C=0. 

Again, speaking roughly, we say that (4) gives each of x and y in the 
form ὃ and that x and y are ‘indeterminate’. More strictly, ας =)d,, 
b,=Ab, and c,=Ac, for some multiple A. The equations are identical, 
_ shown graphically by coincident lines. 


A.5. Completing the square. An essential property of the system of 
real numbers is that an expression in real variables which is a perfect 
square must be non-negative. It is zero if the terms inside the square 
vanish ; otherwise it is positive. Further, if an expression is the sum 
of two or more perfect squares, then it must be positive, except that 
it is zero when all the squares separately vanish. This property is 
not necessarily true of other number systems; with complex num- 
bers, for example, a perfect square can be negative: i? = — 1. 

Much use of this property of real numbers is made in algebra. For 
example, completing the square helps in a quadratic polynomial: 


25 —x% —3 =2 (x? — 40 Ὁ γα) --ὃ -- τ -- 2 (2 -1)2- 


Hence: (2a? — a — 8) —-( — 32) =2 (ἡ -3)250 
i.e. ae -ὦ -- 8 is greater than ( -- 535), except that it equals — 25 when 
x=. The quadratic expression has the smallest value (- 25) when 
2=4. To generalise: 
Geen | y=ax*+bx+e (a>0) 
write yaa(et +204 75) το- -Z=a(e+ 5) Ae 

4a? 4a, 2a 4a, 
1.6. ν-(- 5 )=a(2+5-) So 
1.6. y has minimum -- alts when x= — δ᾽ ἱ 

4a, 2a, 


As another example of reducing a difference to a square, we show 
that “- J (xy) for all positive real x and y (equals only if x =y) by 


writing the difference: 


aot (zy) =3{( Vx)? + (Jy)? -- 2/e/y} = (Je — Jy)? 0. 


496 APPENDIX [A 


Squaring up is a useful process. It is, however, not reversible; the 
square of a is a? but the square root of a? is either a or —a. To illus- 
trate this disadvantage, we can attempt to solve 2x — 1+ ./(2a+1)=0: 

24 -- 1 τῷ -- {(25 -ἰ- 1]. 
Square: 4y9-4¢4+1=24+1 ie. 2x(2x-3)=0. 
Hence x =0 and x =3/2 are possible. Checking, we find that x =0 does © 
satisfy the original equation, but not x=3/2. However, 

24 -1—-/(2~4+1)=0, 
a different equation, squares up to the same quadratic; x = 3/2 satisfies 
the equation, but not 7=0. 


A.6. Clearing the denominator. The numerator in a ratio is more 
easily handled than the denominator. The device of clearing the 
denominator of unwanted elements (e.g. roots) is a very useful one 
in algebra. Simple examples illustrate: 
Lo 5- 2-1 υδ-ιὶ gy 

J2+1 (J24+1)(72-1) (/2)?-1 2-1 

f2+1_ (./2 +1)? _ (2h +2V24+1_9 pig 

J2-1> (J2-1)(J241) (2) 
both depending on the simple result: (./2 + 1)(/2 -—1)=(/2)?-1=1. 
The trick is to multiply numerator and denominator by the same 
expression, so chosen to clear the denominator of the awkward ,/2. 
More generally : 


a+b/2 (a+b/2)(c-d/2) ac+be,/2 —ad,/2 — 2bd 


ct+d/2 (c+d/2)(e—d/2) © c? — 243 
a+b/2_ /ac—2bd bc — ad 
1.6. c+d./2 = (2 Ξ 8] + (ς: a) J/2 So Ke CSR δος ον δ φι ονον (5) 


The result (5) shows that the ratio of two numbers of the form 
a+6,/2, where a and ὃ are rationals, is a number of the same form. 
The same device works in many other cases. For example: 


τς ee ae Ja 


if x is real and positive. Since μὰ +1 ἮΝ so Va+1+ /a>2,/” and: 


(all positive real 5). 


1 1 
/ ms Se δ τας ταὶ 
a Ve Ja+1l+Jx 2,/ x" 


6, 7] APPENDIX — 497 


The device is of particular use in handling ratios of complex numbers: 
a+b (a+1b)(c—td) ac+ibe —1ad -- bd 


στὰ (e+id)(e-id) de) 
a+tbh fac+bd\ ./bc-ad 
1.6. στο (aoa) ΠΕΣ. οὐ δοοοοονοοοδοοοσδοοο (6) 


The similarity between (5) and (6) is clear, In (6), the denominator is 
cleared of the awkward 7. It shows that the ratio of two complex 
numbers is also a complex number. 


A.7. Trigonometric Ratios 

(1) In elementary geometry and trigonometry, an angle @ is 
measured in degrees. Consider first an acute angle 6, i.e. 0 positive and 
less than a right angle (0° <@<90°). The 
definition of the trigonometric ratios is 
given in terms of the triangle OPM of 
Fig. A.7, any right-angled triangle with 
LPOM=86: 

DEFINITION: The trigonometric ratios 
(cosine, sine and tangent) of the acute 
angle @ are: 


cos δε: Ὁ sin je tan pues 
OP ’ OP’ OM ~ 
Any right-angled triangle of the shape of Fie. A. 7 
OPM will do since all such triangles are 
similar, i.e. the ratios of sides are the same for all. 

It follows immediately that tan 6 is the ratio of sin θ to cos θ and 
(from Pythagoras’ Theorem OP?=OM? + M P?) that the sum of the 
squares of sin @ and cos@ is 1. As a convenient notation, write 
(sin 0)? =sin? θ and similarly for the others. Hence: 


tan C= ἢ and sin? θ- cos? @=1 
Three trigonometric ratios are defined here and it is useful to have 
all three. But the results (7) show that any two can be expressed in 
terms of the other. Perhaps the simplest expression of this is: 


sin 6 
co 


1 t 
cos 6 = A wv: and sin @= τῷ where ¢=tan @. 


498 APPENDIX [A 


Values of the ratios for particular angles are obtained by drawing 
appropriate triangles. Limits as θ--» 05 or 90° can be written. So: 


sin 0 0 $ 5 . 1 
Ee ΜΟῚ « 
tan @ 0 3 1 3 


* tan §-—>00 as §->90°. 


The addition formulae are: 


cos (6, + 84) cos (8, — θ4) 

=cos 0, cos 8, — sin 8, sin 6, =cos 0, cos 9, +sin 6, sin 4, 
sin (6, + @,) sin (6, — θ.) 

=sin 0, cos 6,+ cos θ᾽ sin 0, =sin 6, cos θὲς — cos 6, sin 6, 
tan (6, + 6.) tan (8, -- 42) 

_ tan 6,+tan 6, _ tan 6, -- ἴδῃ 4, 

~ 1—tan 6, tan 4, — 1+tan θ᾽ tan 6, 


(ii) For a positive angle @ of any size, measured in degrees, the 

definition of the trigonometric ratios needs extension. In Fig. A.7, 
let P rotate anticlockwise around a circle centred at O describing an 
angle θ from the starting position A. Take signed distances from O 
along OA, positive to the right and negative to the left of O; take 
signed distances vertically (parallel to OB), positive upwards and 
negative downwards. Then proceed by stages: 
When Ρ is in the second quadrant (as P’ in Fig. A.7), @ varies 
between 90° and 180°. Keep the definition of the trigonometric 
ratios, except that OP is positive but other lengths have a sign 
(+ or —). So, for 0’ of Fig. A.7, we have 


g OM _ ne ative sin @’= P 
apr ee ~ OP’ 


= positive 


sin’ Μ'Ρ' 


and tan cos θ᾽ OM 


= negative. 


1,8] APPENDIX 499 


This is equivalent (as can be seen from Fig. A.7 with θ' Ξ- 180 — θ) to 
writing the following for θ acute, 180° — @ in the second quadrant: 
cos (1805 -- θ) κΞ --οοϑ θ. and_ sin (180° -- θ) Ξε βίῃ @. 
The extension to the third quadrant follows, equivalent to: 
cos (1805 --θ)-Ξ --οο θ. and _ sin (180°+0)= —sin 0 
and so to the fourth quadrant: 
cos (360° —-@)=cos@ and _ sin (360° -- θ) -Ξ- —sin 0. 

This takes care of positive angles up to 360°. For still larger angles, 
P has rotated around the complete circle one or more times. Then: 
cos (n360° + 6)=cos@ and_ sin (n360° +4) =sin 6 

for any positive integer n. 
Finally, for a negative angle 0, the rotation of P is taken from A 
in the negative or clockwise direction. Then: 
cos (—@)=cos@ and sin (-- θ)-Ξ -- βίῳ θ 
sometimes expressed by saying that cos θ is an ‘even’ function and 
sin θ is an ‘odd’ function. 
The results (7) and (8) hold for angles of any size. Indeed, the 


addition formulae for 6,—6, in (8) follow from those for 0,+4, by 
substituting (—@,) for 43. 


A.8. Triangles. Denote the angles of a triangle by 4, B and C and the 

opposite sides by a, ὃ and c respectively. The relations between sides 
and angles are: 

a bc. 

sin A sin B sinC 


and a?=b?+c? —2bc cos A......... (9) 


The second relation of (9) has two similar forms, for 6? and for οἷ 
respectively. As a check, put A -- 905 so that sin 4 =1 and cos A =0. 


The first relations of (9) give sin B - and sin C =<, the definition of 


these trigonometric ratios. From the second relation of (9): 
a* = 67 +c, 
and this is Pythagoras’ Theorem for a right-angled triangle. 

Two triangles are similar if corresponding angles are equal: 
A=A’, B=B’ and C=C’. From the first relations of (9), sides are 
proportional, ic. a:b :c=a’' : δ' : ο΄. As a particular case, if corre- 
sponding sides are also equal, then the triangles. are congruent. 


R A.B.M. 


500 APPENDIX [A 


Necessary and sufficient conditions for congruent triangles are: (i) 
three sides correspond, or (ii) two sides and included angle correspond, 
or (iii) one side and two angles correspond. 

The area A of a triangle ABC, defined as half the product of the 
base and the height, can be expressed in terms of sides and angles: 


Δ =tbe sin A=4casin B=tabsinC ............ (10) 


The equality of the three expressions for A in (10) is ensured by the 
a b 
sin A’ sin B 


first relations of (9). The common value of the ratios 


- 7 is then seen to be ane : 
sin C BA 


and 


A.9. Cartesian and polar co-ordinates. Fix axes Ox and Oy in a plane, 
each being a directed line on which a suitable scale of measurement 
is taken. In Fig. A.9, Ox is drawn horizontally to the right and Oy 
vertically upwards. Any point P in the plane 
needs two co-ordinates to fix it, corresponding to 
the two dimensions of the plane. Alternative 
5) systems of pairs of co-ordinates can be devised 

and two of them are in common use. In one 

“x Μ system, that of Cartesian co-ordinates, the num- 
F%q. A. 9 ber pair (x, y) is attached to a point P, where 

x is the distance OM and y is the distance MP 
in Fig. A.9. Here each of x and y is any real number (positive, negative 
or zero). So, if “50, P is to the right of Oy; if x=0, P is on Oy; if 
x<0, P is to the left of Oy. Similarly, the sign of y determines 
whether P is above, on or below Oz. 

The other system, that of polar co-ordinates, attaches the number 
pair (r, 6) to P, where r is the distance OP and θ is the angle (in 
degrees) which OP makes anti-clockwise with Ox. Both co-ordinates 
are real numbers, r being zero or positive and @ lying in the interval 
0°<6 <360°. 

The relations between the co-ordinate systems follow at once: 


e=7r cos? and Y=Pf sin ......{ννοννννννον (11) 


This is a transformation from given polar co-ordinates (r, 6) to 
corresponding Cartesian co-ordinates. The inverse transformation, 
also needed, is more difficult to specify. By use of (7): 


9] APPENDIX 501 


re—g?+y? and tan θ--Ξ 
which are also checked from Fig. A.9. Hence: 


r=,/g2+y2 and @=tan-? = ἘΣ (12) 


Since r is positive, the value of r in (12) is unambiguous, the positive 
square root of the positive expression x? + y? (apart from the special 
case x =y =0 with r=0). The value of 6 is not unambiguous as shown 
in (12) since there are two angles in the interval 0°<6 <360° with a 
given value for tan 0. (Fig. 12.7) shows how tan @ repeats itself in 
this interval.) One of these two values is to be selected and the other 
rejected. The criterion is: find r and select 6 so that cos 6=2/r (with 
sign of x) and so that sin 0=y/r (with sign of y). 
Hence (12) must be read subject to the condition: 
Of the two values of θ in the range 05 «θ <360° with tan 0=y/z, 
that value is taken for which cos θ has the sign of x and sin 6 the 
sign of y. 
The other value of θ in the range is to be rejected; it is such that cos 6 
has the opposite sign-to x and sin 6 the opposite sign to y. 


EXERCISES: SOLUTIONS 


19 1. γ--32 1.85; for 0<xz<100. 2. 
3. ἀ Ξε, 4. χ-- ὃ; νϑ «2. 
10. Roots: 1 (twice) and —4; 0 (twice) and ὅ. 
13. «=4(1—-k), y=4$(1 +h). 
2.9 1. No. 20. x | O12. 
14. Use Ex. 13 with 0,=0,=8. 010900 
17. eel, l 012 
2/;021 
3.9 4. Note that the second gives i= +(1+4)//2(2.9 Ex. 15). 
8. All real x(* +1) for each rational fraction; but all real «(+ -- 1) 
for second reduced form. 
11. Of quadratics z*+ax+b where (a, δ) =(0, 0), (0, 1), (1, 0) or (1, 1), 
only z? +2 +1 has no factors. 
13. If « real (complex) then g(x) has real (complex) coefficients. Other 
roots: —1; none; 1; none. 
15. +1 and (147,/3)//2; 1 and +2. 
18. J,={..r-10,r—5, 7,7 +5, 7 +10, ...} 
e.g. Jy ={...-9, —4, 1, 6, 11, ...}. 
4.9 2. 1,2, 3,4, 5, 6, 5, 4, 3, 2, 1 elements; total 36 elements in As 
4. 8;12. 7. Rule 7(a). 8. Rules 2(a), 4(a), 8(α). 
10. 25, 40, 20, 20. Intersection of each of C’, ANC, A’ with each of 


5.9 1 


18. 


19 


Y,N,(YUN)’. 


. Yes, accepting Axiom of Choice. 


(i) high, some not rising; (ii) and (11) not high, not all rising; 
(iv) high, all rising. 
(a)~p; (b)qvr; (c) ~qAr; (4) ~pA(~GAT); (€-) PAG>PAT. 


— 


FF T 6. pqr ~pAgd avr qr 
: Men S | TEFF F EF T 
Mi TTT EF T T 

WorD|TET F Τ' T 

a, and a, exhaus- Women ᾿ ᾿ τ τ Zl Β Τ 
tive: P(a,Va,)=1l. Ῥ T 
WorD; FFT F T T 


Evens. 


6.9 


7.9 


8.9 


9.9 


10.9 


21. 
28. 


22. 
80. 


19. 


12. 


18. 


EXERCISES: SOLUTIONS 503 


Each = 4/20 = 1/5. 

Let a, =B’s hand better, a,=B’s hand worse, a=B raises. Then: 
P(a,) =1/10 and P(a,)=9/10. P(a | a,)=9/10 and P(a | α4) =1/5. 
By Bayes: P(a, | a) : P(a,|a)=9 : 18. 


: (")> (5. for r=], 2,...4(n -- Ἰ); (*) =(.",) for r=4(n +1). 


. For: δε b14«a=e%a=a; if b* ας, τοῦ * x, then 7, =2,. 
. If integers (excluding 0) are a group, then 0 never appears in 


x table. 


. Write rotation through 0° as 1. If rotations through 90°, 180°, 270° 


are x, A, μ, then AxA=l and A=—-1;«xK=-land κεῖ; Χμξὶ 
and p= -- 

E.g. if 2, =ib, 2. =jc, then 2,2, =kbe but zz, = — kbc. 

Draw graphs of y=2?+2+1 and y=1/(1 - 3). 


. Note: x is not brother of himself; y brother of x may imply z is 


sister of y. 


. For <Ry =z and y both even, xRzx true if x even, false if x odd. 
. Gradient either AB/OA (1'΄ vertically for 12’ horizontally) or 


AB/OB (1’ vertically for 12’ up hill). Former is slope of OB. 


. First result: (z, y) and its negative (a, ὃ) add to zero (0,0). So 


(0, 0) -Ξ (α, ψ) +(a, δ) =(w@+a,y+b). 0. a= —-2%, b= -y. 


. No. 0<A<1 for points on PQ between P and Q; other A for points 


not between P and Q. 

Slope of P,P,, m=(y2—Y1)/(%_ —%1) when %,42%,. But m does not 
exist when x, =2, (line parallel to Oy); moo as x,->2, (Chap. 9). 
c?> r2(a? -- 83); this is also condition that perpendicular distance 
from O to line is greater than r. 


. Inverse of y=az? on «20 is x=./(y/a) on ψ:5 0 (a>0) but on 


y <0 (a <0). Both increasing (a> 0) or both decreasing (a <0). 


. Write u=1—./(1 +2) defined on x> -1 with range u<l. 
. 8,7 +0 ifr=+1;S, oscillates ifr = -1. 
. (i) y0, (ii) and (iii) y>2, (iv) y> 40, (v) and (vi) y>4. y is defined 


at x= — I] in (ii) and (v), not in (111) and (vi). 


. f(x) is 1/./u where u = 1 + 1/2. 


. Lines with slopes m and m’ are perpendicular if mm’ = -- 1; normal 


has slope — 1/f’(x,). Tangent: 27,7 --ἰ -- αἵ =0; normal: 
x +2a,y —x,(1 +277) =0. 


. D(Yx?) = [) («Ῥίαγ = oe =f ϑαν-α. 


Dy =(1 +?)/(1 —2?)?; D®?y =2x(3 +x?)/(1 —x?)®. y>oo as 2>1 from 
below. 
Greatest height when v =0, t=u/g, x =u?/2g. 


504 


11.9 


12.9 


15. 


21. 


22. 


23. 


27. 


EXERCISES: SOLUTIONS 


| ως aR : 
R=px= 2x(1 -x) with — =2(1 -- 2.) (£ per item). R increases to $ 


dx 
as maximum (£500) at x =4 (500 items per νὰ ἢ 
2 
Integral is -- 4fu-? du = ξιι 1. Consistent since ἐχ τη a << — + 1). 


Write Jo dx = =| -|D px( |S - 1) dx. 


fa™-1(1 —a)"- da =f (1 -—u)™u"-1( -- 1) du= -- fu®(1 —u)™ Δ du; 
limits (0, 1) for x go to (1, 0) for u, so that 


1 0 
ant —2£)"—1 dx= ΠΝ -)- du 


1 1 
= (un —u)™— du =a" - «)}-1 da. 


. Yes, 5-Ξ} -$/3. 

. Sufficient that R’(«) =C’(«) and Κ΄ (α) <C’’(«). 

. Min. z=1 at x =4/5, y=1/5. 

. Take sides of rectangle as 2x and 2y; corners symmetrically on circle. 
. Alternative: smallest circle from which rectangle of given area can 


be cut. 


. Max. R=px +qy subject to φ(“;, y) =0 gives ¢’, :p=¢’, : ᾳ (by Ex. 


13). 


. 3 or 4 terms of second series give log,, 2 =0-301. 
. Proofs in Hardy, Pure Mathematics (10th Ed., 1952), Examples 


XXIX. Note: to avoid terminating decimals, express as recurring 


by Ex. 25. 
1 1 


+0! @+2)! 
Second relation: τα! Ξρᾷ -- 1)! and. 5.41 -- 4 Ἐφ} ἘΦ: +... (all 


First relation: write e—S,= +... and multiply q!. 


positive integers). But there is no positive integer less than 1/q. 


. « or B negative integer: series terminates; y negative integer: series 


not defined after a finite number of terms. 


. Series: 1 + $02 + 804+ 752° +.... 

. Dett* = +xe+3*”, 

. Yes; y decreases at constant proportionate rate. 
I 

. From fz, e" dt, show y =7a(l —e ὅ97) and u =*e. 


. [65 sin x dz =e* sin x -- fe* cos x dx and fe* cos x dz similarly; then 


substitute from one to other. 


00 ie 2) 
. y symmetric about 2 =0; so [ y dx =i" oy dx =4. Then: 


[ et" da = -- ἐν π and Γ(ξ) Ξ v2. ent” dt τε π. 


13.9 


14.9 


20. 


. Particular integral y = 


EXERCISES: SOLUTIONS 505 


] 
. Xs =4{ — Y3(By 233 — A332) + Yo(211%33 — ας 1) — Yg(A 11839 4430.) } 


] 
3 = Ay (aut — α14639) — Ya (A11G 93 + αχ50 4.) + Ys (411029 — ax). 


. First 2 equations (5) give x, and x, by (8) of 13.3. Substitute in third 


equation for relation: 
ψι(αςτα5 — α25031) — Yo(41143q — Ay9%g1) + Y3(A11Aaq — Ay 221) = 0. 


n 
- Inner products >» 6,,c, for r=1, 2, ...m. 
s=1 


. See (1) and (2) of 13.8. 

. (r, 8)th element is z,y, in vy’ and σιν, in yz’. 

. AB is m xk and (AB)C is m xn. 

. If Ais mxn, first O is ἢ xn, second m xm, third m xn. 


n 
. (r, 8)th element τ 2 a 


1 — 


gu =. 


. As a> +I, yoo for each 2, i.e. oscillations of ‘infinite’ amplitude. 
. If a11%22 — a12%2,% 0, solve the equations for Dy and Dz. 


. Equations become Dy = (a,, + ka,.)y = (dee Ἐταμὴν. Two values of k 


from: @,,k? + (αι —@e.)k -- 41 =0. 


. Solution: y, =Aa"+B(-a)" where y=A+B and y,=(A -- B)a. 


] fe : 
Hence: ψ6 Ξε ἐψοία" + ( -- α)"} +57 yr{a" —(-a)"}, giving y, =ya” if 
nm even, y, =y,a"—! if n odd. 
IfS and C are the two Laplace Transforms, the results obtained give 
I 
S=- -= C and C =f S, ie. S=a/(p* + a) and C =p/(p* + 0). 


a 


INDEX 


Numbers refer to pages and those in italics to exercises 


Absolute value, 23, 43, 357 
Acceleration, 296 
Adjunction, 33, 41, 71, 
453 
Adler, Irving, 24 
Algebra, Boolean, 94-7, 119, 379 
elementary, 1-2, 7-18 
linear, 364-7, 480—8 
of functions, 238—42 
of probabilities, 126 
of statements, 119 
total matrix, 390 
Algebraic numbers, 160—2 
Algorithm, 63 
Altwerger, ὃ. I., 25 
Amplitude, of oscillation, 417-18, 427, 
437 
Angle, measure, 354-5 
Anti-derivative, 275, 285-8 
Archimedes (circa 250 B.c.), 262, 334 
Area, as limit, 276—81, 283—4, 331 
of circle, 262, 353—5 
Argand (1768-1822), 43 
Argand diagram, 43, 82-3, 357, 454 
Argument (amplitude), 43, 357 
Array, double, 20, 366, 377 
Automorphism, 183, 197 
Auxiliary equation, 412, 426 
Axiom of Choice, 111, 458 
Axiomatic approach, 2—7, 46, 89, 144, 
200, 205-6, 443—4, 455-8 
Axioms, Euclid’s, 200, 212 
Peano’s, 443—4 
Zermelo’s, 89, 455 


158-9, 420, 


Basis (vector space), 368, 372, 483 
Bayes (d. 1761), 132 

Bell, E. T., 25 

Birkhoff, G. (with S. MacLane), 25, 66 
Bessel (1784-1846), 439 

Bessel function, 333, 439 

Beta function, 298, 333, 362-3 
Binary system, 57 


Binomial coefficients, 23, 139 
distribution, 138—9 
theorem and series, 26, 138, 320-2, 
326, 347 
Bolyai (1802-60), 212 
Bolzano (1781-1848), 263 
Boole (1815-64), 94 


Cancellation, 30-1, 47, 144, 158 
Cantor (1845-1918), 89, 108, 263 
Cartesian co-ordinates, 210, 416, 500 
Cartesian product, 89, 171, 234, 456, 458 
Cauchy (1789-1857), 270, 313 
Church, A., 7 
Circular functions, 345—7, 352-6, 417-19, 
476-80 
Co-factor, 385 
Complement (sets), 92, 457 
Complementary function, 409, 422—4 
Complex numbers, 40-5, 71, 159, 356-7, 
453—4, 497 
Compound interest, 361, 406, 423 
Conditional, connective, 114 
probability, 130 
Conformable (matrices), 378 
Conformal transformation, 189, 350 
Conjugate complex, 79-80, 415, 426 
Conjunction, 114, 131 
Connective, 114 
Continuity, of functions, 256-9, 268, 
270-1, 465 
of variation, 265, 269 
Continuum, 39, 448 
Contraries and contradictories, 117 
Convergence, absolute, 314, 316, 351, 
471 
conditional, 314, 318 
of infinite integrals, 298, 313, 328 
of infinite series, 309 
of sequences, 248, 263 
radius of, 317, 473-4 
uniform, 325, 473 
Co-ordinate geometry, 214-20, 237 


508 


Correspondence, one-one and many-one, 
97-100, 180-1 

Cosets, 168, 196 

Countable, 104 

Courant, R. (with H. Robbins), 25 

Cramer (1704-52), 401 

Cross-ratio, 221, 231 

Cubics, 26, 493-4 

Cut (Dedekind), 37—8, 447-8 


d’Alembert (1717-83), 313 
Damping, 417-18, 427, 437 
Decimals, 35, 57, 58, 329 
Dedekind (1821-1916), 263,448 _ 
Degenerate cases, 18, 376, 395, 494-5 
De Moivre (1667-1764), 60 
Denumerable, 104 
Derivatives, as limits, 269, 467 
as operator D, 292-5, 419-23 
nth (successive), 293 
partial, 327 
rules and standard forms, 272-5, 335, 
338, 347—9, 358 
sign, 306, 327 
Descartes (1596-1650), 210 
Determinants, 384—7 
Difference equations, linear, 423-8, 438 
first order, 425 
second order, 426-8 
simultaneous, 428, 440 
Differential equations, linear, 406-19, 
422-3, 436-8 
first order, 411, 414-15 
second order, 411—13, 415-19, 423 
simultaneous, 414-15, 440 
Dimension (vector space), 365, 368, 372, 
481 
Dirichlet (1805-59), 235, 471 
Discontinuity, 256, 271 
Disjoint (sets), 93 
Disjunction, 114, 131. 
Divergence, of infinite series, 314 
Domain, 8, 12, 172, 178, 234, 235 
Duality, 96, 201, 224, 226 


Einstein, A., 200 
Elliptic space, 212-13 
Equation of real and imaginary parts, 45, 
350 
Equivalence classes, 176, 192 
relations, 99, 173—7, 192 
statements, 117, 120, 122 
up to an isomorphism, 182 
Euclid (circa 300 B.c.), 2, 59, 63, 199, 212, 
262 


INDEX 


Euclidean space, 207-11, 365 
Euclid’s division algorithm, 63, 76—7 
Euler (1707-83), 333 
Expansion of determinants, 386 
of functions, 316, 318-26, 333, 358 
Expansions, binomial, 320-2, 344 
cosine and sine, 345, 480 
exponential, 336, 476 
inverse tangent, 348, 480 
logarithmic, 340, 476 
Exponent, complex, 
415-16, 426-7 
rational and real, 14—15, 341, 489-90 
Exponential function, 334-7, 339-40, 
350-2, 406, 434-5, 475-6 
lag, 434 
Extreme values, 307 


350, 352, 357, 


Factor group, 168 
Factorial n, 23 
Factors and roots, 66, 74-80, 454 
Feed-back, 433 
Field, 31—4, 38, 42, 50, 54, 157-60, 460-1 
complete ordered, 38, 164, 233, 243 
extension, 170 
isomorphism, 186 
of convergent sequences, 263 
of polynomials, 86—7 
of rational fractions, 70—1, 160, 419-20, 
452-3 
of scalars, 205, 367, 461-2 
ordered, 31, 38, 163 
skew, 170 
Final residue, 37, 248, 252, 463 
Fourier (1758-1830), 441 
Fraenkel, A. H. (with Y. Bar-Hillel), 458 
Frazer, R. A. (with W. J. Duncan and 
A. R. Collar), 25 
Frequency, 418 
Functions, algebraic, 236-42, 331 
as mappings, 179-81, 234-5, 261 
as relations, 177-8 
as transformations, 186—7 
composite (function of a function), 
239, 272-3, 335, 338 
definition of new, 326, 331-3, 410, 439 
derived, 269 
inverse, 241, 259, 272-3 
limits of, 250—5, 464-5 
monotone, 240, 259 
multi-valued, 178 
of complex variable, 74, 85, 189-91, 
350, 352, 435 
ofreal variable, 179, 216, 234—6, 250, 331 
of two variables, 327 


INDEX 


Galileo (1564-1642), 105 
Galois (1811-32), 492 
Gamma function, 333, 363 
Gauss (1777-1855), 65 
Generating function, 432 
Geometric series, 244-5, 262, 310-12, 319, 
326, 347 
Geometric space (see vector space) 
Geometry, affine, 232 
co-ordinate, 2, 214-20 
Euclidean, 1-4, 199-200, 
214-20 
metric, 201 
non-Euclidean; 200, 211~14 
projective, 2, 220-3 
Goldherg, 8., 128 
Gradient (slope), 228 
Graphs, 9-10, 73, 166 
logarithmic, 360, 405-6 
of function and locus, 179, 214 
Gravitation, 296 
Greatest lower bound (GLB), 39, 164, 
232, 243, 450 
Group, additive, 146, 168, 203, 228 
affine, 197, 232 
commutative, 144, 148, 155, 157, 460 
cyclic, 149, 156-7, 168, 185, 357 
full linear, 189, 390, 402 
isomorphism, 184 
multiplicative, 147-9 
non-commutative, 148, 156, 169, 459 
of equilateral triangle, 155-6 
of operator D, 294 
of projections, 221, 230 
of rigid motions, 201, 230 
of square, 169 
of transformations, 153—7, 188—9 
of translations, 155 
of vectors, 203, 228 
rules, 143—5, 458-9 7 
Growth, rate of, 266-7, 339-40, 361, 
406 


207-11, 


Hardy, G. H., 4, 25 

Harmonic division, 231 

Highest common factor (H.C.F.), 62, 76 

Homogeneous co-ordinates, 224-8 
equations, 401, 404 

Homomorphism, 197 

Hyperbolic space, 212-14 
functions, 353, 362 

Hypergeometric series, 330, 363, 439 


Idempotent, 94 
Identity (equation), 10 


509 


Identity of group, 30, 47, 144, 458 
Image, mapping, 179 
projective, 221 
Implication (statements), 117, 120 
Inequality, 11-12, 164-7, 379-80, 399 
Inf. and Sup., 263, 306, 466—7 
Infinite dimension, 368, 481 
integral, 298, 313, 328 
sets, 90, 104-11, 457-8 
Infinite series, absolute convergence, 314, 
316, 351, 471 
alternating, 310, 314-15 
conditional convergence, 314, 318 
convergence (tests), 308-15, 328 
divergence, 314 
of complex terms, 351 
oscillating, 311-12, 314 . 
power, 316-18, 323-6, 350-2, 473-4 
uniform convergence, 325, 473 
Infinity, circular points at, 227-8 
line at, 225-7 
symbol for, 239, 245, 254 
Inflexion, points of, 307-8 
Initial conditions, 409-10, 424-5 
Inner products, 378, 400 
Integers, 46-9, 146, 182 4, 443-6 
Gaussian, 64-6 . 
mod n, 52, 55, 85, 97, 146, 148, 160, 
185 | 
Integral domain, 49, 68, 70, 159, 163, 
450, 460 
Integrals, and areas, 276—81, 283—4, 331 
as anti-deriva tives, 285-8 
as limits, 278-81, 467-9 
as operator D-1, 292-5, 419-20 
definite and indefinite, 285~—8, 291-2 
nth (multiple), 293—4 
reduction formula, 359 
rules and standard forms, 289-91, 335, 
338, 347, 359 | 
Integration by parts, 289, 299 
by substitution, 290 
of differential equation, 407 © 
of power series, 324—6, 331-3 
Intersection (sets), 92, 456 
Intervals, nested, 37, 248 
Invariance, 199, 201, 220-2, 230 
Inverse, as negative or reciprocal, 30, 
144, 458, 490 
function, 241, 259, 272-3 
matrix, 387 
transformation, 155, 188, 366, 375, 
393, 402 
Irreducible polynomial, 78 
Isomorphism, 181—6, 195, 237, 484 


δ10 


Jacobi (1804-51), 399 
Jacobian, 399 
Jaeger, J. C., 429 


Kemeny, J. G. (with J. L. Snell and G. L. 
Thompson), 24, 134, 135 
Kronecker, (1823-91), 443 


Lagrange (1736-1813), 270, 328 
Lagrange multiplier, 328 
Laplace (1749-1827), 429 
Laplace transform, 429-32, 436, 441 
Least upper bound (LUB), 39, 164, 
232, 243, 450 
Legendre (1752-1833), 439 
Liebniz (1646-1716), 250, 262, 270 
Limit for e, 340-1 
for log x: x9: οὔ, 344-5 
for 7, 262 
for sin σία, 347 
of functions, 250—5, 464-5 
of sequences, 242-50, 256 
Limit process and stages, 246-8, 251- 2, 
462-3 
Linear algebra, 364-7, 480-8 
combinations, 365, 367, 369, 481 
equations, 17, 366, 375-6, 393-8, 
494-5 
forms, 364, 403 
models, 432—4, 438 
transformations, 187-9, 365-6, 372-6, 
381, 393, 485-7 
Linearity, 364, 403, 406, 438 
Linearly dependent and independent, 
367-8, 370, 372, 391, 481 
Lobachevsky (1793-1856), 212 
Locus (curve), 214—20 
circle, 214-15, 220, 227-8 
conic section, 232 
ellipse, 232 
parabola, 230 
tangent to, 268, 271, 295, 348 
Logarithmic function, 15, 337-9, 474—5 
Logarithms, common, 15-16, 342-3, 
491-2 
natural (Naperian), 342—3 
Logically true and false, 116 


Mappings, 179-81, 234-5, 
372-3 
Marginal rate of change, 267, 269 
revenue and cost, 267, 297 
Mathematical induction, 47, 48, 
444 


261, 365-6, 


101, 


INDEX 


Matrices, and vectors, 377, 390, 399 
as vector spaces, 380, 389 
concept, 366, 377 
orthogonal, 388, 401 
rank, 390-3, 487—8 
rules, 379-81 
square and rectangular, 384 
singular, 387 
symmetric and skew symmetric, 388, 
400 
Maximum and minimum values, 297, 
305-8, 327, 466-7, 495 
constrained, 327-8 
Mean Value Theorems, 300—5 
Modulo n, 52 
Modulus, 23, 43, 357 
Moore, E. H. (with H. L. Smith), 250 
Murdoch, D. C., 24 


Napier (1550-1617), 342 
Natural number, 46, 104, 182, 443-5 
Necessary and _ sufficient conditions, 
120--3 
Negation, 114, 192 
Neighbourhood, 238, 251 
Newton (1642-1727), 250, 262 
Normal distribution, 337, 359 
Notation, derivatives, 270 
determinants, 387 
functions, 178, 236, 241 
generally, 18-24 
integrals, 281 
matrices, 377 
sets, 88 
sigma (2), 20-2, 279-80, 378 
transformations, 154 ᾿ 
Number e, 315, 330, 333, 336, 340—1, 357, 
475-6 
4, 41, 44-5, 357 
π, 59, 262, 334, 349, 355, 357, 477 
Number systems, binary, 55-8 
cardinal, 98-101, 104, 109 
complex, 40-5, 71, 159, 356-7, 453-4, 
497 
integers, 46—9, 146, 182—4, 443-6 
octal, 58, 67 
rationals, 27-32, 49-51, 106, 148, 159, 
164, 174, 446-7 
real, 35-40, 108, 159, 164, 364, 370, 
447-50 


Octals, 58, 61 

Odds, 134 

Operational rules, algebra, 29-31 
derivatives, 272—4 


INDEX | 511 


groups, 144, 459 
inequalities, 165 
integrals, 289-91 
Laplace transforms, 431 
matrices, 379-81 
order, 32 
scalar products, 205 
sets, 92—4, 457 
Operators, 22, 154, 292-5, 419-23, 428, 
44] 
Order, 31-2, 38—40, 47, 50, 162—4, 191-5, 
201, 445, 447 
Ordered pairs, 43, 171, 364, 370 
Oscillations, 246, 411, 415-19, 427, 
434-8 


Parameters, 13, 260 ; 

Particular integral, 409, 423 4. 

Partition, of set or interval, 175, 192, 
277, 279, 467 

Pascal (1623-62), 138 

Pascal’s triangle, 139 

Peano (1858-1932), 443 

Pencil of lines, 231 

Period and phase, 417-18, 427, 437 

Permutations, 153, 180 

Place-holder (marker), 12, 44, 49, 68 

Poincaré (1854-1912), 181, 214 

Polar co-ordinates, 210, 416, 500 

Polynomials, 66-83, 86-7, 106-7, 189, 
398, 450-2, 492 

Population growth, 361 

Power functions, 15, 3438--4, 489-90 

Power series, 316-18, 323. 6, 350-2, 
473-4 

of complex variable, 350-2 
Prime field, 158 
number, 63, 65 

Primitive (undefined), 46, 89, 162, 171, 
443, 455 

Probability, 123-35 

Projection, 221 


Quadratic form, 367 | 

Quadratics, 9, 12-14, 16, 42, 61, 66-7, 
365, 371, 492, 495 

Quaternions, 170 

Quotient field, 50, 70, 446 


Radians, 354 
Range, 8, 172, 178, 234 
Rate of change, 266-7, 269 
proportionate, 339 
Rational fractions, 70-1, 160, 419-20, 
452-3 


Reductio ad absurdum, 120 

Reflexive, 99, 109, 140, 457-8 

Relations between sets, 171-3 
equivalence, 99, 173—7, 192 
ordering, 191-5 

Remainder in Taylor’s series, 304, 326 
theorem, 75 

Residue classes, 86, 168, 196 

Resonance, 440 

Riemann (1826-66), 212, 468 

Riemann sum and integral, 468—9 

Ring, 157, 389, 460 


Roots of equations, 12-14, 16, 66, 74-80, 


263, 454, 492-4 
multiple, 75, 80, 85 
Rotation through right-angle (number 2), 
45, 357 


Sawyer, ὟΝ. W., 24 
Scalar products, 70, 84, 208 ὅ, 367-9, 
380, 451, 461-2 
Schaaf, W. L., 24 
Sequences, 36, 242—50, 256 
Sets, and probability, 126 
and relations or functions, 172, 178 
and statements, 118—20 
concept, 5, 7-18, 88-91, 455-6 
empty (nul), 92, 456 
finite and infinite, 90, 100-11, 457—8 
isomorphic, 182 
of subsets, 147, 149, 456-8 
ordered, 162, 191-5 
reflexive, 109, 140, 457-8 
rules, 92—4, 457 
solution, 11-12, 166—7, 215 
truth, 118, 126 
universal, 92, 456 
Shift operator H, 428, 441 
Signs, rule of, 445 
Sinusoidal functions, 417-19, 427, 434-5, 
437 
Slope of line or tangent, 219, 228, 268, 
271, 348 
Span, 368, 372, 481 
Speed, 264-6 
Statements, and logic, 6, 114-20 
and probability, 123-6 
consistent, 141 
exclusive and independent, 127, 129-32 
implication and equivalence, 117, 
120-2 
of relations and functions, 172, 178 
simple and compound, 114 
Stationary values, 307 
Step-function, 196, 216-17 


512 


Stochastic process, 135-9 
Subset, subgroup and subfield, 90, 145, 
158, 455 7 


Tangent and inverse tangent, 348-9, 
477-8, 480 
Tangent and normal to curve, 268, 271, 
295, 348 
Taylor’s theorem and series, 302-4, 
315-16, 318-19, 326, 405 
Transcendentals, 161 
Transfer function, 436 
Transformations, cartesian to polar 
co-ordinates, 500-1 
conformal, 189-91, 350 
group of, 153-7, 188-9 
linear, 187-9, 365-6, 372-6, 381, 393, 
485-7 
orthogonal, 402 
rigid motions, 199, 201, 228, 230 
translations and movements, 150-3, 
155-6, 169 
Transpose (matrix), 383 
Tree diagram, 135 
Trigonometric ratios, 210, 219, 354-6, 
| 497-9 
Truth sets and tables, 118, 126 


Union (sets), 92, 147, 456 
Unity, cube and nth roots, 60, 81-3, 
149, 156-7, 185 


INDEX 


Variable, 8, 12, 234, 256, 265 
Vector space, algebraic concept, 204-6, 
364-5, 367-9, 461-2, 480-3 
and geometric space, 204, 206 
_ distance and angle, 207, 212 
Euclidean, 207-11, 485 
of n-tuples, 207, 211, 365, 369, 372, 
483-5 
Vectors, additive group, 203, 228, 367, 
461-2 
algebraic and geometric, 208, 229, 265, 
364 
and complex numbers, 43, 184, 364-5 
length and direction, 210 
row and column, 377 
Velocity, 265—6 
Venn diagrams, 92, 118, 457 
Vieta (1540-1603), 493 


Weierstrass (1815-97), 263, 466 

Weighting function, 432 

Whitehead, A. N., 24 

Whittaker, E. T. (with G. N. Watson), 
334 


Zermelo (1871-1956), 89 
Zero divisors, 31, 54, 97, 383 
matrix, 378, 383, 389 
of number system, 28, 446 
of polynomial (function), 12-13, 74 
vector, 203, 204, 207, 367 


PRINTED IN GREAT BRITAIN BY 
ROBERT MACLEHOSE AND CO. LTD 
THE UNIVERSITY PRESS, GLASGOW 


πο ύκο πα. am! 


DEFINITIONS AND NOTATIONS 
References are to sections of chapters 
summation (1.7) 


n! n factorial (1.7) 
# binomial coefficient (1.7) 
| a | absolute value or modulus (1.7) 
R field of rational numbers (2.1, 2.2) 
R(./2) adjunction, of ./2 to & (2.3) 
R* field of real numbers (2.4) 
GLB greatest lower bound (LUB similarly) (2.4) 
a complex unit, ἐξ Ξε —1 (2.5) 
Ζ complex number x+iy=r(cos 9 +7 sin θ) =re® 
(2.5,A.2.7) 
r, 0 absolute value and argument, of z (2.5) 
C field of complex numbers (2.5) 
J+ set of positive integers (natural numbers) (2.6) 
J integral domain of integers (2.6) 
mod ἡ modulo ἡ (2.7) 
F(z] polynomials over field Κ᾽ (3.3) 
F (x) rational fractions, adjunction of ὦ to {ἢ (3.4) 
ω nth root of unity (3.8) 
{a|ais P} ποὺ (4.1) 
Ε belongs to (4.1) 
ς proper subset of (4.1) 
A’ complement, of set A (4.2) 
γχι intersection and union, of sets (4.2) 
U,¢ universal and empty sets (4.2) 
d,c infinite cardinal numbers (4.7, 4.8) 
~p negation of Ὁ (not) (5.1) 
A, Vv conjunction and disjunction (and, or) (5.1) 
-- implication, many-one mapping (5.1, 7.3) 
<> equivalence, one-one mapping (5.1, 7.3) 
P (a) probability, of statement @ (5.5) 
P (a, | ας) conditional probability, of a, given ας (5.6) 
( group, with identity 6 and ατὶ inverse of a (6.2) 
r(A) operator; sr(A) successive operators (r first, 
s second) (6.4) 
F field, with identities 0 and 1 (6.5) 
Re Cartesian product, of sets (7.1) 
y fia statement of a relation & (7.1) 


y=f(x) function (7.3) 


cos ἃ;, sin x 
tan 2, tan-! 
cosh x, sinh x 
A=|| a, || 
A’ 


(ABCD) 


(1, +2, 0) 
(a, δ] 
N 
ΓΟ) 
Lim /(n) 


Lim f(x) 


y’, Γ᾿ (ΟἹ 
Dy, ΤΊ (x) 


οὐ ἐγ 


a 


f f(x) ar 
Df (x) 
Df () 
Max f(x) 
run 


» hye” 


a =3°14159... 
e—2-71828... 


e”, exp ὦ 
log x 
α΄, ὧὐ 


Ἀ:ΞΙΑῚ 
ἈΑ-’ὶ 
ἢ(}) 
Υ (5) 


᾿ 


mapping; X—Y transformation (7.3) 
Tr 


isomorphic (7.4) 
function of a complex variable (7.6) 
vector space (8.3) 
space of n-tuples over J (8.4) 
n-dimensional Euclidean space over F (8.4) 
AB. = 
ἍΤ. (5. 7) 
circular points at al (8.8) 
interval a <a <6 (9.3) 
neighbourhood, of « (9.3) 
inverse function (9.3) 
limit of f(x) as m increases without bound (9.5) 


cross-ratio 


limit of f(x) as x approaches « (9.6) 
derivative of y =f (x) (10.2) 


definite integral of f(x), area (10.5) 


indefinite integral of f(a), anti-derivative (10.6, 10.8) 


nth derivative, (-7)th integral, of f(a) (10.8) 

local maximum of f(x) (minimum similarly) (11.2) 
infinite series (11.3) 

power series, with radius of convergence r (11.6) 
Archimedes’ constant (12.1, 12.5) 


' εἰ} 
Euler’s constant, Lim (1 +) (12.1, 12.2) 
noo it 


exponential function (12.2) 

logarithmic function (12.3) 

power functions (12.4) 

circular functions (12.5), trigonometric functions 
(12.7) 

hyperbolic functions (12.6) 

matrix (13.4) 

transpose, of matrix (13.5) 

determinant, of matrix (13.6) 

inverse, of matrix (13.6) 

Laplace transform, of ἡ (ἐ) (14.7) 

generating function, of Y,, (14.7) 


ety 


ΤΠ 
dt 


ἡ es eee 


ii 


aarti ce ti Sr νὴ itt a CL με αν πὰ 


5k. ἐς κα 
᾿ΕΝ 
γί, ok 


a 
A 


4 
a 
t 


aS 


amb es 


στ rate Reaptetysene 


es ee ll 


ἫΝ +r + ἴα Σ r 

Ἃ Pi : v ἐς ge 
ΐ i Σ : [Ὁ "ἄτα, τ ᾿ 
i Ρ : / Ἵ 

7. 


Cee 


r 
τ 1] 


t. ὃ hemes 8 2 Re Saat 
fe ‘ I My -- 
ἘΠ 3 Gas 


᾿ 
ΟΝ 
μ᾿ 
j : Tie : 
* nm. : 
Ὡ- 1s τ F : 
i ᾿ as at ite rs: 
δ! ἀξ ος ττε esi “ρα ; πὰ + 
ἢ ‘ - Ὡς αν τ δ 
Ε PAL 3 ea, ς F 
ἢ = " ζ 
r yf BAe ik τ᾿ a 
ἣν ᾿ ; t τ τῷ κ 
7. keh Ἐξι nw 4 " Ε- 
n stile ΜΗ le » 
> ia τῆς + a 
Ea 4 ἢ δ᾽ ees ΠΝ 
rs κω ε 9 
a cade " : 
γ Ἢ ᾿ ᾿ 
= ‘ et ὩΣ" ‘ = 
r es ᾿ 4 
Tei me ol iy Seg : ἕ 
x ae 
i ek re , a 
ape Mira ee : ἢ 
a = " i a 
Ξ ἕ 
" - w Π 
- J * * 
tt τα e. 
1 ry 
a ‘ 5 ἡ ἊΝ 
5" a 
᾿ : 
ξ τ Ἢ 3 
εἰ . ! ᾽ ' . 
= k : 43: ᾿ 
Ἔ 4 σ᾿ ἢ 
= Py ᾿ ΠῚ 
ἢ. ὶ : j 
2 i - + 
“ἢ 1 — 1 “τ -- 
ἘΠῚ ; 7 τ, 
7 F = 
μ᾿ Ἶ i 
- ὦ tm ip k : 
mre Ἐ ; : 3 
he ‘ 1 "9 τ +4 - : 
‘ ἢ » .Σ és Ἰ -* r me ᾿ - 
a 7 La # - 
ἢ ᾿ 
i , £2 εἰ - sa ἜΣ tan! ri 
ΩΝ : ΕΣ - 
1 ; 
f fs x io * 
1 i 4 ‘i se ; an 
Ὗ - H j - ‘a 
i Ῥ ' a 
ἊΣ ΜΙ : = εἰ a 
' 4 “Le ‘ i a Begs 
i - ; t 
i id - , is Ι ᾿ 
’ - ᾿ - ς ᾿ 5 
ξ : ey oe “2 A Ἂ ᾿ ΕΣ κι τ 
- A ~ ᾿ 1 εἰ t Ἢ Jom τ: 
; ee τ ¥ + - ἢ 
τ Ἰ d i αὶ τί - 
- Ἷ τὴ : ε I. 
! r r - f 
f + +a ‘ : ; a , 4 
- ; i i $s ἀν + ν τ " ᾿ 
+ - - αὶ f 
' 
; i : 
ὶ ‘ . 2 5 
δ. ὦ " f 
ar i A tie 
: F 1 i = - A 
ἐν i t , 
* a 1 aM = Σέ [1 ᾿ 
= ἵ ἀμὴν ὁ 4 ἢ 1 Ἧ 
͵ : χ Γ' ! Ὦ» i ἢ 
i your A | te i 
ve ik! fe ἃ oe : τ ἐν 
= i: Ἢ : ke 
Ἢ ἷ τ ᾽ { i ᾿ τῶν τος εἰ 
i j 
er | a ‘ ᾿ + cal 
ἐὰν ἮΝ : 
᾿ Sa | bh a : ἄς - 
ι: πὶ τ Les , * : 
- εἰ i τ if a τὰ 
" Υ τ 
Wo ᾿ — 
: ἊΣ ’ τς εὐ Ὁ - 
ses 4 
ms Ἴ > “ 
Ἵ 5 + ae 2 Ἐ Ἶ a 
Sy ey ; : 
τὰ 
: Ἷ ἔς i a 
‘ , ἀν i 
' ᾿ i ψὶ 
a 
᾿ nie μ] εἰ ᾿ 7 a 
᾿ i ue ᾿ 
4 i ῖ t i |! 
z + . pale nt I 
- ᾿ . tera τ: al 
᾿ a ἔ ᾿ »" i 
- : ἢ δ mye τς ws { - 
τ “ ᾿ 
- 1 ν᾽ im} mh . " 
᾿ ὦ a2 a ᾿ 
s δ " + . " me bier ΥΝ 
. -- 5 : τ = 
τ Tarkan ἐ ΓΑΙ Ge het halyts 
>, Ε Ν᾿ ΓΙ μον it νῇ νοεῖς t eo ἢ 
= ΠῚ 5 Tat) hy tee 1 
Ἀ ὰ ΡΝ 
" ἐ) Ἐν Ἢ 
a 
] ed i 5 ᾿ τὶ 
ἃ a ra ΓΙ ἷ ag oe” me 
4 2 ἷ ae —- ‘ ome κ΄ 
ax ‘ i “ἘΣ TTTAW 
Seer Wiggs A AB LEAN = 2: 
He) . 1 ore - τας «᾿ι. 9 ar 
= τ Ἷ 4 : + a a ἭΝ 
ale = 4 } te vetoes M hei 
τ fata te rar te bith be ἀρ =) 
; dont oN ey Ἡ ἡ ait 


ee a Spe ie, are - 


